Nursing Home Staffing Exploratory Data Analysis¶

Lets analyze data from CMS to understand nursing home staffing levels and improving care at these facilities¶

Importing libraries below:¶
In [1]:
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns 
In [ ]:
cd data/
Lets load payroll based journal data for Q2 2024¶

Additional data can be found in the following link

In [ ]:
pbj = pd.read_csv('PBJ_Daily_Nurse_Staffing_Q2_2024.csv', encoding='iso-8859-1')
Lets check if everything imported correctly and no missing data in our dataset¶
In [4]:
pbj.head()
Out[4]:
PROVNUM PROVNAME CITY STATE COUNTY_NAME COUNTY_FIPS CY_Qtr WorkDate MDScensus Hrs_RNDON ... Hrs_LPN_ctr Hrs_CNA Hrs_CNA_emp Hrs_CNA_ctr Hrs_NAtrn Hrs_NAtrn_emp Hrs_NAtrn_ctr Hrs_MedAide Hrs_MedAide_emp Hrs_MedAide_ctr
0 15009 BURNS NURSING HOME, INC. RUSSELLVILLE AL Franklin 59 2024Q2 20240401 51 10.77 ... 0.0 160.08 160.08 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 15009 BURNS NURSING HOME, INC. RUSSELLVILLE AL Franklin 59 2024Q2 20240402 52 8.43 ... 0.0 135.95 135.95 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 15009 BURNS NURSING HOME, INC. RUSSELLVILLE AL Franklin 59 2024Q2 20240403 53 11.13 ... 0.0 150.31 150.31 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 15009 BURNS NURSING HOME, INC. RUSSELLVILLE AL Franklin 59 2024Q2 20240404 52 12.27 ... 0.0 133.01 133.01 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 15009 BURNS NURSING HOME, INC. RUSSELLVILLE AL Franklin 59 2024Q2 20240405 52 4.95 ... 0.0 137.92 137.92 0.0 0.0 0.0 0.0 0.0 0.0 0.0

5 rows × 33 columns

In [5]:
pbj.shape
Out[5]:
(1325324, 33)
In [6]:
pbj.columns
Out[6]:
Index(['PROVNUM', 'PROVNAME', 'CITY', 'STATE', 'COUNTY_NAME', 'COUNTY_FIPS',
       'CY_Qtr', 'WorkDate', 'MDScensus', 'Hrs_RNDON', 'Hrs_RNDON_emp',
       'Hrs_RNDON_ctr', 'Hrs_RNadmin', 'Hrs_RNadmin_emp', 'Hrs_RNadmin_ctr',
       'Hrs_RN', 'Hrs_RN_emp', 'Hrs_RN_ctr', 'Hrs_LPNadmin',
       'Hrs_LPNadmin_emp', 'Hrs_LPNadmin_ctr', 'Hrs_LPN', 'Hrs_LPN_emp',
       'Hrs_LPN_ctr', 'Hrs_CNA', 'Hrs_CNA_emp', 'Hrs_CNA_ctr', 'Hrs_NAtrn',
       'Hrs_NAtrn_emp', 'Hrs_NAtrn_ctr', 'Hrs_MedAide', 'Hrs_MedAide_emp',
       'Hrs_MedAide_ctr'],
      dtype='object')
In [7]:
pbj.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1325324 entries, 0 to 1325323
Data columns (total 33 columns):
 #   Column            Non-Null Count    Dtype  
---  ------            --------------    -----  
 0   PROVNUM           1325324 non-null  object 
 1   PROVNAME          1325324 non-null  object 
 2   CITY              1325324 non-null  object 
 3   STATE             1325324 non-null  object 
 4   COUNTY_NAME       1325324 non-null  object 
 5   COUNTY_FIPS       1325324 non-null  int64  
 6   CY_Qtr            1325324 non-null  object 
 7   WorkDate          1325324 non-null  int64  
 8   MDScensus         1325324 non-null  int64  
 9   Hrs_RNDON         1325324 non-null  float64
 10  Hrs_RNDON_emp     1325324 non-null  float64
 11  Hrs_RNDON_ctr     1325324 non-null  float64
 12  Hrs_RNadmin       1325324 non-null  float64
 13  Hrs_RNadmin_emp   1325324 non-null  float64
 14  Hrs_RNadmin_ctr   1325324 non-null  float64
 15  Hrs_RN            1325324 non-null  float64
 16  Hrs_RN_emp        1325324 non-null  float64
 17  Hrs_RN_ctr        1325324 non-null  float64
 18  Hrs_LPNadmin      1325324 non-null  float64
 19  Hrs_LPNadmin_emp  1325324 non-null  float64
 20  Hrs_LPNadmin_ctr  1325324 non-null  float64
 21  Hrs_LPN           1325324 non-null  float64
 22  Hrs_LPN_emp       1325324 non-null  float64
 23  Hrs_LPN_ctr       1325324 non-null  float64
 24  Hrs_CNA           1325324 non-null  float64
 25  Hrs_CNA_emp       1325324 non-null  float64
 26  Hrs_CNA_ctr       1325324 non-null  float64
 27  Hrs_NAtrn         1325324 non-null  float64
 28  Hrs_NAtrn_emp     1325324 non-null  float64
 29  Hrs_NAtrn_ctr     1325324 non-null  float64
 30  Hrs_MedAide       1325324 non-null  float64
 31  Hrs_MedAide_emp   1325324 non-null  float64
 32  Hrs_MedAide_ctr   1325324 non-null  float64
dtypes: float64(24), int64(3), object(6)
memory usage: 333.7+ MB
In [8]:
pbj.describe()
Out[8]:
COUNTY_FIPS WorkDate MDScensus Hrs_RNDON Hrs_RNDON_emp Hrs_RNDON_ctr Hrs_RNadmin Hrs_RNadmin_emp Hrs_RNadmin_ctr Hrs_RN ... Hrs_LPN_ctr Hrs_CNA Hrs_CNA_emp Hrs_CNA_ctr Hrs_NAtrn Hrs_NAtrn_emp Hrs_NAtrn_ctr Hrs_MedAide Hrs_MedAide_emp Hrs_MedAide_ctr
count 1.325324e+06 1.325324e+06 1.325324e+06 1.325324e+06 1.325324e+06 1.325324e+06 1.325324e+06 1.325324e+06 1.325324e+06 1.325324e+06 ... 1.325324e+06 1.325324e+06 1.325324e+06 1.325324e+06 1.325324e+06 1.325324e+06 1.325324e+06 1.325324e+06 1.325324e+06 1.325324e+06
mean 9.113430e+01 2.024052e+07 8.341670e+01 5.195887e+00 5.099312e+00 9.657448e-02 1.026275e+01 1.001164e+01 2.511122e-01 3.480192e+01 ... 6.133439e+00 1.737908e+02 1.616093e+02 1.218150e+01 4.314155e+00 4.262067e+00 5.208815e-02 8.578327e+00 8.363303e+00 2.150232e-01
std 9.912806e+01 8.167078e+01 4.904109e+01 4.389664e+00 4.382255e+00 9.182349e-01 1.476875e+01 1.455953e+01 1.845270e+00 3.532269e+01 ... 1.963754e+01 1.151591e+02 1.081243e+02 3.184142e+01 1.336067e+01 1.302080e+01 2.011555e+00 1.785970e+01 1.749784e+01 2.104964e+00
min 1.000000e+00 2.024040e+07 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 ... 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
25% 3.100000e+01 2.024042e+07 5.100000e+01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.325000e+01 ... 0.000000e+00 9.857000e+01 9.025000e+01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
50% 6.900000e+01 2.024052e+07 7.600000e+01 8.000000e+00 8.000000e+00 0.000000e+00 7.500000e+00 7.500000e+00 0.000000e+00 2.575000e+01 ... 0.000000e+00 1.506200e+02 1.400000e+02 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
75% 1.170000e+02 2.024061e+07 1.040000e+02 8.000000e+00 8.000000e+00 0.000000e+00 1.600000e+01 1.600000e+01 0.000000e+00 4.525000e+01 ... 4.000000e+00 2.200000e+02 2.074800e+02 8.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.131000e+01 1.087000e+01 0.000000e+00
max 8.400000e+02 2.024063e+07 7.420000e+02 1.039600e+02 1.039600e+02 2.575000e+01 2.755000e+02 2.755000e+02 8.725000e+01 9.159800e+02 ... 1.380150e+04 1.758100e+03 1.503630e+03 6.614200e+02 4.437500e+02 4.387500e+02 2.470000e+02 4.298000e+02 4.298000e+02 8.775000e+01

8 rows × 27 columns

In [9]:
pbj.isnull().sum().sort_values(ascending=False)
   
Out[9]:
PROVNUM             0
PROVNAME            0
CITY                0
STATE               0
COUNTY_NAME         0
COUNTY_FIPS         0
CY_Qtr              0
WorkDate            0
MDScensus           0
Hrs_RNDON           0
Hrs_RNDON_emp       0
Hrs_RNDON_ctr       0
Hrs_RNadmin         0
Hrs_RNadmin_emp     0
Hrs_RNadmin_ctr     0
Hrs_RN              0
Hrs_RN_emp          0
Hrs_RN_ctr          0
Hrs_LPNadmin        0
Hrs_LPNadmin_emp    0
Hrs_LPNadmin_ctr    0
Hrs_LPN             0
Hrs_LPN_emp         0
Hrs_LPN_ctr         0
Hrs_CNA             0
Hrs_CNA_emp         0
Hrs_CNA_ctr         0
Hrs_NAtrn           0
Hrs_NAtrn_emp       0
Hrs_NAtrn_ctr       0
Hrs_MedAide         0
Hrs_MedAide_emp     0
Hrs_MedAide_ctr     0
dtype: int64
In [10]:
pbj.isna().sum().sort_values(ascending=False)
Out[10]:
PROVNUM             0
PROVNAME            0
CITY                0
STATE               0
COUNTY_NAME         0
COUNTY_FIPS         0
CY_Qtr              0
WorkDate            0
MDScensus           0
Hrs_RNDON           0
Hrs_RNDON_emp       0
Hrs_RNDON_ctr       0
Hrs_RNadmin         0
Hrs_RNadmin_emp     0
Hrs_RNadmin_ctr     0
Hrs_RN              0
Hrs_RN_emp          0
Hrs_RN_ctr          0
Hrs_LPNadmin        0
Hrs_LPNadmin_emp    0
Hrs_LPNadmin_ctr    0
Hrs_LPN             0
Hrs_LPN_emp         0
Hrs_LPN_ctr         0
Hrs_CNA             0
Hrs_CNA_emp         0
Hrs_CNA_ctr         0
Hrs_NAtrn           0
Hrs_NAtrn_emp       0
Hrs_NAtrn_ctr       0
Hrs_MedAide         0
Hrs_MedAide_emp     0
Hrs_MedAide_ctr     0
dtype: int64
Everything looks great. The data has imported correctly. No missing data. Reviewing the columns we can see what is stored in the dataset and since we're interested in looking at staffing levels lets isolate those columns¶
In [11]:
pbj_copy = pbj[['PROVNUM', 'PROVNAME', 'CITY', 'STATE', 'COUNTY_NAME',
       'CY_Qtr', 'WorkDate', 'MDScensus', 'Hrs_RNDON', 'Hrs_RNDON_emp',
       'Hrs_RNDON_ctr', 'Hrs_RNadmin', 'Hrs_RNadmin_emp', 'Hrs_RNadmin_ctr',
       'Hrs_RN', 'Hrs_RN_emp', 'Hrs_RN_ctr', 'Hrs_LPNadmin',
       'Hrs_LPNadmin_emp', 'Hrs_LPNadmin_ctr', 'Hrs_LPN', 'Hrs_LPN_emp',
       'Hrs_LPN_ctr', 'Hrs_CNA', 'Hrs_CNA_emp', 'Hrs_CNA_ctr', 'Hrs_NAtrn',
       'Hrs_NAtrn_emp', 'Hrs_NAtrn_ctr', 'Hrs_MedAide', 'Hrs_MedAide_emp',
       'Hrs_MedAide_ctr']]
In [12]:
pbj_copy.isna().sum().sort_values(ascending=False)
Out[12]:
PROVNUM             0
PROVNAME            0
CITY                0
STATE               0
COUNTY_NAME         0
CY_Qtr              0
WorkDate            0
MDScensus           0
Hrs_RNDON           0
Hrs_RNDON_emp       0
Hrs_RNDON_ctr       0
Hrs_RNadmin         0
Hrs_RNadmin_emp     0
Hrs_RNadmin_ctr     0
Hrs_RN              0
Hrs_RN_emp          0
Hrs_RN_ctr          0
Hrs_LPNadmin        0
Hrs_LPNadmin_emp    0
Hrs_LPNadmin_ctr    0
Hrs_LPN             0
Hrs_LPN_emp         0
Hrs_LPN_ctr         0
Hrs_CNA             0
Hrs_CNA_emp         0
Hrs_CNA_ctr         0
Hrs_NAtrn           0
Hrs_NAtrn_emp       0
Hrs_NAtrn_ctr       0
Hrs_MedAide         0
Hrs_MedAide_emp     0
Hrs_MedAide_ctr     0
dtype: int64
In [13]:
pbj_copy
Out[13]:
PROVNUM PROVNAME CITY STATE COUNTY_NAME CY_Qtr WorkDate MDScensus Hrs_RNDON Hrs_RNDON_emp ... Hrs_LPN_ctr Hrs_CNA Hrs_CNA_emp Hrs_CNA_ctr Hrs_NAtrn Hrs_NAtrn_emp Hrs_NAtrn_ctr Hrs_MedAide Hrs_MedAide_emp Hrs_MedAide_ctr
0 15009 BURNS NURSING HOME, INC. RUSSELLVILLE AL Franklin 2024Q2 20240401 51 10.77 10.77 ... 0.0 160.08 160.08 0.0 0.0 0.0 0.0 0.00 0.00 0.0
1 15009 BURNS NURSING HOME, INC. RUSSELLVILLE AL Franklin 2024Q2 20240402 52 8.43 8.43 ... 0.0 135.95 135.95 0.0 0.0 0.0 0.0 0.00 0.00 0.0
2 15009 BURNS NURSING HOME, INC. RUSSELLVILLE AL Franklin 2024Q2 20240403 53 11.13 11.13 ... 0.0 150.31 150.31 0.0 0.0 0.0 0.0 0.00 0.00 0.0
3 15009 BURNS NURSING HOME, INC. RUSSELLVILLE AL Franklin 2024Q2 20240404 52 12.27 12.27 ... 0.0 133.01 133.01 0.0 0.0 0.0 0.0 0.00 0.00 0.0
4 15009 BURNS NURSING HOME, INC. RUSSELLVILLE AL Franklin 2024Q2 20240405 52 4.95 4.95 ... 0.0 137.92 137.92 0.0 0.0 0.0 0.0 0.00 0.00 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1325319 745040 THE SARAH ROBERTS FRENCH HOME SAN ANTONIO TX Bexar 2024Q2 20240626 44 8.00 8.00 ... 0.0 93.99 93.99 0.0 0.0 0.0 0.0 15.60 15.60 0.0
1325320 745040 THE SARAH ROBERTS FRENCH HOME SAN ANTONIO TX Bexar 2024Q2 20240627 44 8.00 8.00 ... 0.0 85.99 85.99 0.0 0.0 0.0 0.0 13.70 13.70 0.0
1325321 745040 THE SARAH ROBERTS FRENCH HOME SAN ANTONIO TX Bexar 2024Q2 20240628 45 8.00 8.00 ... 0.0 74.05 74.05 0.0 0.0 0.0 0.0 17.18 17.18 0.0
1325322 745040 THE SARAH ROBERTS FRENCH HOME SAN ANTONIO TX Bexar 2024Q2 20240629 45 0.00 0.00 ... 0.0 55.44 55.44 0.0 0.0 0.0 0.0 20.92 20.92 0.0
1325323 745040 THE SARAH ROBERTS FRENCH HOME SAN ANTONIO TX Bexar 2024Q2 20240630 45 0.00 0.00 ... 0.0 66.81 66.81 0.0 0.0 0.0 0.0 13.58 13.58 0.0

1325324 rows × 32 columns

Now we can look at additional nursing home data. This is broken down by month so we import Apr, May and data for 2024 - eventually we will merge this into a single dataset for Q2 2024 data. The data can be found in the following link¶
In [ ]:
cd nursing_homes_including_rehab_services_04_2024
In [15]:
nhp_apr = pd.read_csv('NH_ProviderInfo_Apr2024.csv')
In [16]:
nhp_apr.columns
Out[16]:
Index(['CMS Certification Number (CCN)', 'Provider Name', 'Provider Address',
       'City/Town', 'State', 'ZIP Code', 'Telephone Number',
       'Provider SSA County Code', 'County/Parish', 'Ownership Type',
       'Number of Certified Beds', 'Average Number of Residents per Day',
       'Average Number of Residents per Day Footnote', 'Provider Type',
       'Provider Resides in Hospital', 'Legal Business Name',
       'Date First Approved to Provide Medicare and Medicaid Services',
       'Affiliated Entity Name', 'Affiliated Entity ID',
       'Continuing Care Retirement Community', 'Special Focus Status',
       'Abuse Icon', 'Most Recent Health Inspection More Than 2 Years Ago',
       'Provider Changed Ownership in Last 12 Months',
       'With a Resident and Family Council',
       'Automatic Sprinkler Systems in All Required Areas', 'Overall Rating',
       'Overall Rating Footnote', 'Health Inspection Rating',
       'Health Inspection Rating Footnote', 'QM Rating', 'QM Rating Footnote',
       'Long-Stay QM Rating', 'Long-Stay QM Rating Footnote',
       'Short-Stay QM Rating', 'Short-Stay QM Rating Footnote',
       'Staffing Rating', 'Staffing Rating Footnote',
       'Reported Staffing Footnote', 'Physical Therapist Staffing Footnote',
       'Reported Nurse Aide Staffing Hours per Resident per Day',
       'Reported LPN Staffing Hours per Resident per Day',
       'Reported RN Staffing Hours per Resident per Day',
       'Reported Licensed Staffing Hours per Resident per Day',
       'Reported Total Nurse Staffing Hours per Resident per Day',
       'Total number of nurse staff hours per resident per day on the weekend',
       'Registered Nurse hours per resident per day on the weekend',
       'Reported Physical Therapist Staffing Hours per Resident Per Day',
       'Total nursing staff turnover', 'Total nursing staff turnover footnote',
       'Registered Nurse turnover', 'Registered Nurse turnover footnote',
       'Number of administrators who have left the nursing home',
       'Administrator turnover footnote',
       'Case-Mix Nurse Aide Staffing Hours per Resident per Day',
       'Case-Mix LPN Staffing Hours per Resident per Day',
       'Case-Mix RN Staffing Hours per Resident per Day',
       'Case-Mix Total Nurse Staffing Hours per Resident per Day',
       'Adjusted Nurse Aide Staffing Hours per Resident per Day',
       'Adjusted LPN Staffing Hours per Resident per Day',
       'Adjusted RN Staffing Hours per Resident per Day',
       'Adjusted Total Nurse Staffing Hours per Resident per Day',
       'Adjusted Weekend Total Nurse Staffing Hours per Resident per Day',
       'Rating Cycle 1 Standard Survey Health Date',
       'Rating Cycle 1 Total Number of Health Deficiencies',
       'Rating Cycle 1 Number of Standard Health Deficiencies',
       'Rating Cycle 1 Number of Complaint Health Deficiencies',
       'Rating Cycle 1 Health Deficiency Score',
       'Rating Cycle 1 Number of Health Revisits',
       'Rating Cycle 1 Health Revisit Score',
       'Rating Cycle 1 Total Health Score',
       'Rating Cycle 2 Standard Health Survey Date',
       'Rating Cycle 2 Total Number of Health Deficiencies',
       'Rating Cycle 2 Number of Standard Health Deficiencies',
       'Rating Cycle 2 Number of Complaint Health Deficiencies',
       'Rating Cycle 2 Health Deficiency Score',
       'Rating Cycle 2 Number of Health Revisits',
       'Rating Cycle 2 Health Revisit Score',
       'Rating Cycle 2 Total Health Score',
       'Rating Cycle 3 Standard Health Survey Date',
       'Rating Cycle 3 Total Number of Health Deficiencies',
       'Rating Cycle 3 Number of Standard Health Deficiencies',
       'Rating Cycle 3 Number of Complaint Health Deficiencies',
       'Rating Cycle 3 Health Deficiency Score',
       'Rating Cycle 3 Number of Health Revisits',
       'Rating Cycle 3 Health Revisit Score',
       'Rating Cycle 3 Total Health Score',
       'Total Weighted Health Survey Score',
       'Number of Facility Reported Incidents',
       'Number of Substantiated Complaints',
       'Number of Citations from Infection Control Inspections',
       'Number of Fines', 'Total Amount of Fines in Dollars',
       'Number of Payment Denials', 'Total Number of Penalties', 'Location',
       'Latitude', 'Longitude', 'Geocoding Footnote', 'Processing Date'],
      dtype='object')
In [ ]:
cd .. 
In [ ]:
cd nursing_homes_including_rehab_services_05_2024/
In [19]:
nhp_may = pd.read_csv('NH_ProviderInfo_May2024.csv')
In [ ]:
cd .. 
In [ ]:
cd nursing_homes_including_rehab_services_06_2024/
In [22]:
nhp_jun = pd.read_csv('NH_ProviderInfo_Jun2024.csv')
Lets create a Q2 2024 dataset by merging the Apr, May and June 2024 Data. This will combine to create our Q2 dataset¶
In [23]:
nhp_q2 = pd.concat([nhp_apr, nhp_may, nhp_jun])
In [24]:
nhp_q2.columns
Out[24]:
Index(['CMS Certification Number (CCN)', 'Provider Name', 'Provider Address',
       'City/Town', 'State', 'ZIP Code', 'Telephone Number',
       'Provider SSA County Code', 'County/Parish', 'Ownership Type',
       'Number of Certified Beds', 'Average Number of Residents per Day',
       'Average Number of Residents per Day Footnote', 'Provider Type',
       'Provider Resides in Hospital', 'Legal Business Name',
       'Date First Approved to Provide Medicare and Medicaid Services',
       'Affiliated Entity Name', 'Affiliated Entity ID',
       'Continuing Care Retirement Community', 'Special Focus Status',
       'Abuse Icon', 'Most Recent Health Inspection More Than 2 Years Ago',
       'Provider Changed Ownership in Last 12 Months',
       'With a Resident and Family Council',
       'Automatic Sprinkler Systems in All Required Areas', 'Overall Rating',
       'Overall Rating Footnote', 'Health Inspection Rating',
       'Health Inspection Rating Footnote', 'QM Rating', 'QM Rating Footnote',
       'Long-Stay QM Rating', 'Long-Stay QM Rating Footnote',
       'Short-Stay QM Rating', 'Short-Stay QM Rating Footnote',
       'Staffing Rating', 'Staffing Rating Footnote',
       'Reported Staffing Footnote', 'Physical Therapist Staffing Footnote',
       'Reported Nurse Aide Staffing Hours per Resident per Day',
       'Reported LPN Staffing Hours per Resident per Day',
       'Reported RN Staffing Hours per Resident per Day',
       'Reported Licensed Staffing Hours per Resident per Day',
       'Reported Total Nurse Staffing Hours per Resident per Day',
       'Total number of nurse staff hours per resident per day on the weekend',
       'Registered Nurse hours per resident per day on the weekend',
       'Reported Physical Therapist Staffing Hours per Resident Per Day',
       'Total nursing staff turnover', 'Total nursing staff turnover footnote',
       'Registered Nurse turnover', 'Registered Nurse turnover footnote',
       'Number of administrators who have left the nursing home',
       'Administrator turnover footnote',
       'Case-Mix Nurse Aide Staffing Hours per Resident per Day',
       'Case-Mix LPN Staffing Hours per Resident per Day',
       'Case-Mix RN Staffing Hours per Resident per Day',
       'Case-Mix Total Nurse Staffing Hours per Resident per Day',
       'Adjusted Nurse Aide Staffing Hours per Resident per Day',
       'Adjusted LPN Staffing Hours per Resident per Day',
       'Adjusted RN Staffing Hours per Resident per Day',
       'Adjusted Total Nurse Staffing Hours per Resident per Day',
       'Adjusted Weekend Total Nurse Staffing Hours per Resident per Day',
       'Rating Cycle 1 Standard Survey Health Date',
       'Rating Cycle 1 Total Number of Health Deficiencies',
       'Rating Cycle 1 Number of Standard Health Deficiencies',
       'Rating Cycle 1 Number of Complaint Health Deficiencies',
       'Rating Cycle 1 Health Deficiency Score',
       'Rating Cycle 1 Number of Health Revisits',
       'Rating Cycle 1 Health Revisit Score',
       'Rating Cycle 1 Total Health Score',
       'Rating Cycle 2 Standard Health Survey Date',
       'Rating Cycle 2 Total Number of Health Deficiencies',
       'Rating Cycle 2 Number of Standard Health Deficiencies',
       'Rating Cycle 2 Number of Complaint Health Deficiencies',
       'Rating Cycle 2 Health Deficiency Score',
       'Rating Cycle 2 Number of Health Revisits',
       'Rating Cycle 2 Health Revisit Score',
       'Rating Cycle 2 Total Health Score',
       'Rating Cycle 3 Standard Health Survey Date',
       'Rating Cycle 3 Total Number of Health Deficiencies',
       'Rating Cycle 3 Number of Standard Health Deficiencies',
       'Rating Cycle 3 Number of Complaint Health Deficiencies',
       'Rating Cycle 3 Health Deficiency Score',
       'Rating Cycle 3 Number of Health Revisits',
       'Rating Cycle 3 Health Revisit Score',
       'Rating Cycle 3 Total Health Score',
       'Total Weighted Health Survey Score',
       'Number of Facility Reported Incidents',
       'Number of Substantiated Complaints',
       'Number of Citations from Infection Control Inspections',
       'Number of Fines', 'Total Amount of Fines in Dollars',
       'Number of Payment Denials', 'Total Number of Penalties', 'Location',
       'Latitude', 'Longitude', 'Geocoding Footnote', 'Processing Date'],
      dtype='object')
In [25]:
nhp_q2.head()
Out[25]:
CMS Certification Number (CCN) Provider Name Provider Address City/Town State ZIP Code Telephone Number Provider SSA County Code County/Parish Ownership Type ... Number of Citations from Infection Control Inspections Number of Fines Total Amount of Fines in Dollars Number of Payment Denials Total Number of Penalties Location Latitude Longitude Geocoding Footnote Processing Date
0 015009 BURNS NURSING HOME, INC. 701 MONROE STREET NW RUSSELLVILLE AL 35653 2563324110 290 Franklin For profit - Corporation ... NaN 2 24644.14 0 2 701 MONROE STREET NW,RUSSELLVILLE,AL,35653 34.5149 -87.736 NaN 2024-04-01
1 015010 COOSA VALLEY HEALTHCARE CENTER 260 WEST WALNUT STREET SYLACAUGA AL 35150 2562495604 600 Talladega For profit - Corporation ... 0.0 0 0.00 0 0 260 WEST WALNUT STREET,SYLACAUGA,AL,35150 33.1637 -86.254 NaN 2024-04-01
2 015012 HIGHLANDS HEALTH AND REHAB 380 WOODS COVE ROAD SCOTTSBORO AL 35768 2562183708 350 Jackson Government - County ... NaN 0 0.00 0 0 380 WOODS COVE ROAD,SCOTTSBORO,AL,35768 34.6611 -86.047 NaN 2024-04-01
3 015014 EASTVIEW REHABILITATION & HEALTHCARE CENTER 7755 FOURTH AVENUE SOUTH BIRMINGHAM AL 35206 2058330146 360 Jefferson For profit - Individual ... 0.0 0 0.00 0 0 7755 FOURTH AVENUE SOUTH,BIRMINGHAM,AL,35206 33.5595 -86.722 NaN 2024-04-01
4 015015 PLANTATION MANOR NURSING HOME 6450 OLD TUSCALOOSA HIGHWAY MC CALLA AL 35111 2054776161 360 Jefferson For profit - Individual ... NaN 1 975.00 0 1 6450 OLD TUSCALOOSA HIGHWAY,MC CALLA,AL,35111 33.3221 -87.034 NaN 2024-04-01

5 rows × 100 columns

In [26]:
nhp_q2_clean = nhp_q2[['Provider Name', 'Provider Address',
       'City/Town', 'State',
       'County/Parish', 'Ownership Type',
       'Number of Certified Beds', 'Average Number of Residents per Day', 'Provider Type',
       'Provider Resides in Hospital', 'Legal Business Name',
       'Date First Approved to Provide Medicare and Medicaid Services',
       'Continuing Care Retirement Community',
       'Abuse Icon',
       'Overall Rating',
       'Health Inspection Rating',
       'QM Rating',
       'Long-Stay QM Rating',
       'Staffing Rating',
       'Reported Nurse Aide Staffing Hours per Resident per Day',
       'Reported LPN Staffing Hours per Resident per Day',
       'Reported RN Staffing Hours per Resident per Day',
       'Reported Licensed Staffing Hours per Resident per Day',
       'Reported Total Nurse Staffing Hours per Resident per Day',
       'Total number of nurse staff hours per resident per day on the weekend',
       'Registered Nurse hours per resident per day on the weekend',
       'Reported Physical Therapist Staffing Hours per Resident Per Day',
       'Case-Mix Nurse Aide Staffing Hours per Resident per Day',
       'Case-Mix LPN Staffing Hours per Resident per Day',
       'Case-Mix RN Staffing Hours per Resident per Day',
       'Case-Mix Total Nurse Staffing Hours per Resident per Day',
       'Adjusted Nurse Aide Staffing Hours per Resident per Day',
       'Adjusted LPN Staffing Hours per Resident per Day',
       'Adjusted RN Staffing Hours per Resident per Day',
       'Adjusted Total Nurse Staffing Hours per Resident per Day',
       'Adjusted Weekend Total Nurse Staffing Hours per Resident per Day',
       'Total Weighted Health Survey Score',
       'Number of Facility Reported Incidents',
       'Number of Substantiated Complaints',
       'Number of Fines', 'Total Amount of Fines in Dollars',
       'Number of Payment Denials', 'Total Number of Penalties']]
In [27]:
nhp_q2_clean.shape
Out[27]:
(44563, 43)
In [28]:
nhp_q2_clean.columns
Out[28]:
Index(['Provider Name', 'Provider Address', 'City/Town', 'State',
       'County/Parish', 'Ownership Type', 'Number of Certified Beds',
       'Average Number of Residents per Day', 'Provider Type',
       'Provider Resides in Hospital', 'Legal Business Name',
       'Date First Approved to Provide Medicare and Medicaid Services',
       'Continuing Care Retirement Community', 'Abuse Icon', 'Overall Rating',
       'Health Inspection Rating', 'QM Rating', 'Long-Stay QM Rating',
       'Staffing Rating',
       'Reported Nurse Aide Staffing Hours per Resident per Day',
       'Reported LPN Staffing Hours per Resident per Day',
       'Reported RN Staffing Hours per Resident per Day',
       'Reported Licensed Staffing Hours per Resident per Day',
       'Reported Total Nurse Staffing Hours per Resident per Day',
       'Total number of nurse staff hours per resident per day on the weekend',
       'Registered Nurse hours per resident per day on the weekend',
       'Reported Physical Therapist Staffing Hours per Resident Per Day',
       'Case-Mix Nurse Aide Staffing Hours per Resident per Day',
       'Case-Mix LPN Staffing Hours per Resident per Day',
       'Case-Mix RN Staffing Hours per Resident per Day',
       'Case-Mix Total Nurse Staffing Hours per Resident per Day',
       'Adjusted Nurse Aide Staffing Hours per Resident per Day',
       'Adjusted LPN Staffing Hours per Resident per Day',
       'Adjusted RN Staffing Hours per Resident per Day',
       'Adjusted Total Nurse Staffing Hours per Resident per Day',
       'Adjusted Weekend Total Nurse Staffing Hours per Resident per Day',
       'Total Weighted Health Survey Score',
       'Number of Facility Reported Incidents',
       'Number of Substantiated Complaints', 'Number of Fines',
       'Total Amount of Fines in Dollars', 'Number of Payment Denials',
       'Total Number of Penalties'],
      dtype='object')
In [29]:
nhp_q2_clean.isnull().sum().sort_values(ascending=False)
Out[29]:
Long-Stay QM Rating                                                      1681
Adjusted LPN Staffing Hours per Resident per Day                         1298
Adjusted RN Staffing Hours per Resident per Day                          1298
Adjusted Weekend Total Nurse Staffing Hours per Resident per Day         1298
Adjusted Nurse Aide Staffing Hours per Resident per Day                  1298
Adjusted Total Nurse Staffing Hours per Resident per Day                 1298
Total number of nurse staff hours per resident per day on the weekend    1230
Registered Nurse hours per resident per day on the weekend               1230
Case-Mix Total Nurse Staffing Hours per Resident per Day                 1230
Case-Mix LPN Staffing Hours per Resident per Day                         1230
Reported Physical Therapist Staffing Hours per Resident Per Day          1230
Case-Mix Nurse Aide Staffing Hours per Resident per Day                  1230
Case-Mix RN Staffing Hours per Resident per Day                          1230
Reported LPN Staffing Hours per Resident per Day                         1230
Reported Total Nurse Staffing Hours per Resident per Day                 1230
Reported Licensed Staffing Hours per Resident per Day                    1230
Reported RN Staffing Hours per Resident per Day                          1230
Reported Nurse Aide Staffing Hours per Resident per Day                  1230
QM Rating                                                                 688
Staffing Rating                                                           619
Overall Rating                                                            443
Health Inspection Rating                                                  443
Average Number of Residents per Day                                       222
Total Weighted Health Survey Score                                        187
Provider Address                                                            0
Provider Name                                                               0
City/Town                                                                   0
Legal Business Name                                                         0
Provider Resides in Hospital                                                0
Provider Type                                                               0
Number of Certified Beds                                                    0
County/Parish                                                               0
Ownership Type                                                              0
State                                                                       0
Continuing Care Retirement Community                                        0
Date First Approved to Provide Medicare and Medicaid Services               0
Abuse Icon                                                                  0
Number of Facility Reported Incidents                                       0
Number of Substantiated Complaints                                          0
Number of Fines                                                             0
Total Amount of Fines in Dollars                                            0
Number of Payment Denials                                                   0
Total Number of Penalties                                                   0
dtype: int64
In [30]:
nhp_q2_clean2 = nhp_q2_clean[['Provider Name', 'Provider Address', 'City/Town', 'State',
       'County/Parish', 'Ownership Type', 'Number of Certified Beds',
       'Average Number of Residents per Day', 'Provider Type',
       'Legal Business Name',
       'Abuse Icon', 'Overall Rating',
       'Health Inspection Rating', 'QM Rating',
       'Staffing Rating',
       'Reported Nurse Aide Staffing Hours per Resident per Day',
       'Reported LPN Staffing Hours per Resident per Day',
       'Reported RN Staffing Hours per Resident per Day',
       'Reported Licensed Staffing Hours per Resident per Day',
       'Reported Total Nurse Staffing Hours per Resident per Day',
       'Total number of nurse staff hours per resident per day on the weekend',
       'Registered Nurse hours per resident per day on the weekend',
       'Reported Physical Therapist Staffing Hours per Resident Per Day',
       'Total Weighted Health Survey Score',
       'Number of Facility Reported Incidents',
       'Number of Substantiated Complaints', 'Number of Fines',
       'Total Amount of Fines in Dollars', 'Number of Payment Denials',
       'Total Number of Penalties']]
In [31]:
nhp_q2_clean2.describe()
Out[31]:
Number of Certified Beds Average Number of Residents per Day Overall Rating Health Inspection Rating QM Rating Staffing Rating Reported Nurse Aide Staffing Hours per Resident per Day Reported LPN Staffing Hours per Resident per Day Reported RN Staffing Hours per Resident per Day Reported Licensed Staffing Hours per Resident per Day ... Total number of nurse staff hours per resident per day on the weekend Registered Nurse hours per resident per day on the weekend Reported Physical Therapist Staffing Hours per Resident Per Day Total Weighted Health Survey Score Number of Facility Reported Incidents Number of Substantiated Complaints Number of Fines Total Amount of Fines in Dollars Number of Payment Denials Total Number of Penalties
count 44563.000000 44341.000000 44120.000000 44120.000000 43875.000000 43944.000000 43333.000000 43333.000000 43333.000000 43333.000000 ... 43333.000000 43333.000000 43333.000000 44376.000000 44563.000000 44563.000000 44563.000000 4.456300e+04 44563.000000 44563.000000
mean 106.906559 81.579412 2.851677 2.786741 3.493607 2.656313 2.258376 0.874887 0.650999 1.525886 ... 3.313972 0.448443 0.069540 75.485000 1.838229 6.396629 2.155331 3.812306e+04 0.216974 2.372304
std 59.244058 48.347041 1.429560 1.273555 1.271494 1.297675 0.583376 0.366250 0.481218 0.576619 ... 0.871040 0.394832 0.085115 83.809434 3.961449 11.909506 3.960706 8.116695e+04 0.565154 4.142854
min 4.000000 1.000000 1.000000 1.000000 1.000000 1.000000 0.000000 0.000000 0.000000 0.000000 ... 0.007560 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000e+00 0.000000 0.000000
25% 66.000000 49.400000 2.000000 2.000000 3.000000 1.000000 1.896540 0.652370 0.385910 1.240130 ... 2.827870 0.239170 0.020380 25.333000 0.000000 0.000000 0.000000 0.000000e+00 0.000000 0.000000
50% 100.000000 74.100000 3.000000 3.000000 4.000000 3.000000 2.175480 0.863460 0.550400 1.430700 ... 3.169520 0.360420 0.050170 49.333000 0.000000 2.000000 1.000000 7.445750e+03 0.000000 1.000000
75% 128.000000 102.100000 4.000000 4.000000 5.000000 4.000000 2.531260 1.059040 0.776800 1.669630 ... 3.633720 0.534490 0.088810 94.333000 2.000000 7.000000 3.000000 3.481253e+04 0.000000 3.000000
max 843.000000 728.500000 5.000000 5.000000 5.000000 5.000000 10.471010 5.314250 8.705490 8.927850 ... 13.838690 8.623290 2.285620 1655.170000 91.000000 251.000000 108.000000 1.235806e+06 6.000000 108.000000

8 rows × 21 columns

In [32]:
nhp_q2_grp = nhp_q2_clean2.groupby(['Provider Name', 'Provider Address', 'City/Town', 'State',
       'County/Parish', 'Ownership Type', 'Provider Type',
       'Legal Business Name', 'Abuse Icon']).mean()
In [33]:
nhp_q2_grp = nhp_q2_grp.reset_index()
In [34]:
nhp_q2_grp.head()
Out[34]:
Provider Name Provider Address City/Town State County/Parish Ownership Type Provider Type Legal Business Name Abuse Icon Number of Certified Beds ... Total number of nurse staff hours per resident per day on the weekend Registered Nurse hours per resident per day on the weekend Reported Physical Therapist Staffing Hours per Resident Per Day Total Weighted Health Survey Score Number of Facility Reported Incidents Number of Substantiated Complaints Number of Fines Total Amount of Fines in Dollars Number of Payment Denials Total Number of Penalties
0 15 CRAIGSIDE 15 CRAIGSIDE PLACE HONOLULU HI Honolulu Non profit - Corporation Medicare CRAIGSIDE RETIREMENT RESIDENCE N 45.0 ... 4.12362 1.37277 0.09588 4.667000 0.0 0.000000 0.0 0.000000 0.0 0.0
1 24TH PLACE 600 24TH AVENUE SOUTHWEST NORMAN OK Cleveland For profit - Individual Medicare and Medicaid 24TH PLACE LLC N 89.0 ... 2.79740 0.20533 0.00272 123.333000 0.0 22.000000 1.0 5438.000000 0.0 1.0
2 60 WEST 60 WEST STREET ROCKY HILL CT Hartford For profit - Corporation Medicare and Medicaid SECURECARE OPTIONS LLC Y 95.0 ... 3.60752 0.34423 0.00954 39.333000 0.0 0.000000 0.0 0.000000 0.0 0.0
3 A GRACE SUB ACUTE & SKILLED CARE 1250 S. WINCHESTER BOULEVARD SAN JOSE CA Santa Clara For profit - Corporation Medicare and Medicaid LITA & AVA INC. N 166.0 ... 4.64830 1.02150 0.04976 82.222333 0.0 6.666667 1.0 7003.750000 0.0 1.0
4 A HOLLY PATTERSON EXTENDED CARE FACILITY 875 JERUSALEM AVENUE UNIONDALE NY Nassau Government - State Medicare and Medicaid NASSAU HEALTH CARE CORPORATION N 589.0 ... 2.31646 0.39343 0.03315 112.000333 0.0 0.000000 3.0 67301.403333 0.0 3.0

5 rows × 30 columns

In [35]:
pbj_clean = pbj[['PROVNAME', 'CITY', 'STATE', 'CY_Qtr','MDScensus', 'Hrs_RNDON', 'Hrs_RNDON_emp',
       'Hrs_RNDON_ctr', 'Hrs_RNadmin', 'Hrs_RNadmin_emp', 'Hrs_RNadmin_ctr',
       'Hrs_RN', 'Hrs_RN_emp', 'Hrs_RN_ctr', 'Hrs_LPNadmin',
       'Hrs_LPNadmin_emp', 'Hrs_LPNadmin_ctr', 'Hrs_LPN', 'Hrs_LPN_emp',
       'Hrs_LPN_ctr', 'Hrs_CNA', 'Hrs_CNA_emp', 'Hrs_CNA_ctr', 'Hrs_NAtrn',
       'Hrs_NAtrn_emp', 'Hrs_NAtrn_ctr', 'Hrs_MedAide', 'Hrs_MedAide_emp',
       'Hrs_MedAide_ctr']]
In [36]:
pbj_grp1 = pbj_clean.groupby(['PROVNAME', 'CITY', 'STATE', 'CY_Qtr']).mean()
In [37]:
pbj_grp1.reset_index(inplace=True)
pbj_grp1.head()
Out[37]:
PROVNAME CITY STATE CY_Qtr MDScensus Hrs_RNDON Hrs_RNDON_emp Hrs_RNDON_ctr Hrs_RNadmin Hrs_RNadmin_emp ... Hrs_LPN_ctr Hrs_CNA Hrs_CNA_emp Hrs_CNA_ctr Hrs_NAtrn Hrs_NAtrn_emp Hrs_NAtrn_ctr Hrs_MedAide Hrs_MedAide_emp Hrs_MedAide_ctr
0 15 CRAIGSIDE HONOLULU HI 2024Q2 43.274725 2.967033 2.967033 0.0 19.275714 19.275714 ... 0.000000 120.622747 120.622747 0.000000 1.881429 1.881429 0.0 0.000000 0.000000 0.0
1 24TH PLACE NORMAN OK 2024Q2 68.758242 5.561099 5.561099 0.0 6.428132 6.428132 ... 0.000000 119.243516 119.243516 0.000000 0.000000 0.000000 0.0 36.975824 36.975824 0.0
2 60 WEST ROCKY HILL CT 2024Q2 92.582418 5.524725 5.524725 0.0 20.681319 20.637363 ... 9.390989 244.871758 244.464835 0.406923 0.000000 0.000000 0.0 0.000000 0.000000 0.0
3 A GRACE SUB ACUTE & SKILLED CARE SAN JOSE CA 2024Q2 102.439560 4.747253 4.747253 0.0 0.000000 0.000000 ... 0.000000 279.782967 279.782967 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.0
4 A HOLLY PATTERSON EXTENDED CARE FACILITY UNIONDALE NY 2024Q2 467.384615 3.925824 3.925824 0.0 4.568681 4.568681 ... 2.060440 947.222527 927.420330 19.802198 0.000000 0.000000 0.0 0.000000 0.000000 0.0

5 rows × 29 columns

In [38]:
pbj_grp1.rename(columns = {'PROVNAME' : 'Provider Name'}, inplace = True)
In [39]:
pbj_grp1.columns
Out[39]:
Index(['Provider Name', 'CITY', 'STATE', 'CY_Qtr', 'MDScensus', 'Hrs_RNDON',
       'Hrs_RNDON_emp', 'Hrs_RNDON_ctr', 'Hrs_RNadmin', 'Hrs_RNadmin_emp',
       'Hrs_RNadmin_ctr', 'Hrs_RN', 'Hrs_RN_emp', 'Hrs_RN_ctr', 'Hrs_LPNadmin',
       'Hrs_LPNadmin_emp', 'Hrs_LPNadmin_ctr', 'Hrs_LPN', 'Hrs_LPN_emp',
       'Hrs_LPN_ctr', 'Hrs_CNA', 'Hrs_CNA_emp', 'Hrs_CNA_ctr', 'Hrs_NAtrn',
       'Hrs_NAtrn_emp', 'Hrs_NAtrn_ctr', 'Hrs_MedAide', 'Hrs_MedAide_emp',
       'Hrs_MedAide_ctr'],
      dtype='object')
In [40]:
nhp_q2_grp.columns
Out[40]:
Index(['Provider Name', 'Provider Address', 'City/Town', 'State',
       'County/Parish', 'Ownership Type', 'Provider Type',
       'Legal Business Name', 'Abuse Icon', 'Number of Certified Beds',
       'Average Number of Residents per Day', 'Overall Rating',
       'Health Inspection Rating', 'QM Rating', 'Staffing Rating',
       'Reported Nurse Aide Staffing Hours per Resident per Day',
       'Reported LPN Staffing Hours per Resident per Day',
       'Reported RN Staffing Hours per Resident per Day',
       'Reported Licensed Staffing Hours per Resident per Day',
       'Reported Total Nurse Staffing Hours per Resident per Day',
       'Total number of nurse staff hours per resident per day on the weekend',
       'Registered Nurse hours per resident per day on the weekend',
       'Reported Physical Therapist Staffing Hours per Resident Per Day',
       'Total Weighted Health Survey Score',
       'Number of Facility Reported Incidents',
       'Number of Substantiated Complaints', 'Number of Fines',
       'Total Amount of Fines in Dollars', 'Number of Payment Denials',
       'Total Number of Penalties'],
      dtype='object')
In [41]:
pbj_grp1.head()
Out[41]:
Provider Name CITY STATE CY_Qtr MDScensus Hrs_RNDON Hrs_RNDON_emp Hrs_RNDON_ctr Hrs_RNadmin Hrs_RNadmin_emp ... Hrs_LPN_ctr Hrs_CNA Hrs_CNA_emp Hrs_CNA_ctr Hrs_NAtrn Hrs_NAtrn_emp Hrs_NAtrn_ctr Hrs_MedAide Hrs_MedAide_emp Hrs_MedAide_ctr
0 15 CRAIGSIDE HONOLULU HI 2024Q2 43.274725 2.967033 2.967033 0.0 19.275714 19.275714 ... 0.000000 120.622747 120.622747 0.000000 1.881429 1.881429 0.0 0.000000 0.000000 0.0
1 24TH PLACE NORMAN OK 2024Q2 68.758242 5.561099 5.561099 0.0 6.428132 6.428132 ... 0.000000 119.243516 119.243516 0.000000 0.000000 0.000000 0.0 36.975824 36.975824 0.0
2 60 WEST ROCKY HILL CT 2024Q2 92.582418 5.524725 5.524725 0.0 20.681319 20.637363 ... 9.390989 244.871758 244.464835 0.406923 0.000000 0.000000 0.0 0.000000 0.000000 0.0
3 A GRACE SUB ACUTE & SKILLED CARE SAN JOSE CA 2024Q2 102.439560 4.747253 4.747253 0.0 0.000000 0.000000 ... 0.000000 279.782967 279.782967 0.000000 0.000000 0.000000 0.0 0.000000 0.000000 0.0
4 A HOLLY PATTERSON EXTENDED CARE FACILITY UNIONDALE NY 2024Q2 467.384615 3.925824 3.925824 0.0 4.568681 4.568681 ... 2.060440 947.222527 927.420330 19.802198 0.000000 0.000000 0.0 0.000000 0.000000 0.0

5 rows × 29 columns

Now that we have two different Q2 2024 datasets. Lets combine both datasets and start exploring to find any data insights¶
In [42]:
df_comp = pd.merge(pbj_grp1, nhp_q2_grp, on = 'Provider Name', how = 'inner')
In [43]:
pbj_grp1.shape
Out[43]:
(14562, 29)
In [44]:
nhp_q2_grp.shape
Out[44]:
(16427, 30)
In [45]:
df_comp.columns
Out[45]:
Index(['Provider Name', 'CITY', 'STATE', 'CY_Qtr', 'MDScensus', 'Hrs_RNDON',
       'Hrs_RNDON_emp', 'Hrs_RNDON_ctr', 'Hrs_RNadmin', 'Hrs_RNadmin_emp',
       'Hrs_RNadmin_ctr', 'Hrs_RN', 'Hrs_RN_emp', 'Hrs_RN_ctr', 'Hrs_LPNadmin',
       'Hrs_LPNadmin_emp', 'Hrs_LPNadmin_ctr', 'Hrs_LPN', 'Hrs_LPN_emp',
       'Hrs_LPN_ctr', 'Hrs_CNA', 'Hrs_CNA_emp', 'Hrs_CNA_ctr', 'Hrs_NAtrn',
       'Hrs_NAtrn_emp', 'Hrs_NAtrn_ctr', 'Hrs_MedAide', 'Hrs_MedAide_emp',
       'Hrs_MedAide_ctr', 'Provider Address', 'City/Town', 'State',
       'County/Parish', 'Ownership Type', 'Provider Type',
       'Legal Business Name', 'Abuse Icon', 'Number of Certified Beds',
       'Average Number of Residents per Day', 'Overall Rating',
       'Health Inspection Rating', 'QM Rating', 'Staffing Rating',
       'Reported Nurse Aide Staffing Hours per Resident per Day',
       'Reported LPN Staffing Hours per Resident per Day',
       'Reported RN Staffing Hours per Resident per Day',
       'Reported Licensed Staffing Hours per Resident per Day',
       'Reported Total Nurse Staffing Hours per Resident per Day',
       'Total number of nurse staff hours per resident per day on the weekend',
       'Registered Nurse hours per resident per day on the weekend',
       'Reported Physical Therapist Staffing Hours per Resident Per Day',
       'Total Weighted Health Survey Score',
       'Number of Facility Reported Incidents',
       'Number of Substantiated Complaints', 'Number of Fines',
       'Total Amount of Fines in Dollars', 'Number of Payment Denials',
       'Total Number of Penalties'],
      dtype='object')
In [46]:
df_comp.describe()
Out[46]:
MDScensus Hrs_RNDON Hrs_RNDON_emp Hrs_RNDON_ctr Hrs_RNadmin Hrs_RNadmin_emp Hrs_RNadmin_ctr Hrs_RN Hrs_RN_emp Hrs_RN_ctr ... Total number of nurse staff hours per resident per day on the weekend Registered Nurse hours per resident per day on the weekend Reported Physical Therapist Staffing Hours per Resident Per Day Total Weighted Health Survey Score Number of Facility Reported Incidents Number of Substantiated Complaints Number of Fines Total Amount of Fines in Dollars Number of Payment Denials Total Number of Penalties
count 16280.000000 16280.000000 16280.000000 16280.000000 16280.000000 16280.000000 16280.000000 16280.000000 16280.000000 16280.000000 ... 15997.000000 15997.000000 15997.000000 16229.000000 16280.000000 16280.000000 16280.000000 1.628000e+04 16280.000000 16280.000000
mean 83.312830 5.199100 5.104579 0.094522 10.243991 9.997557 0.246433 34.612262 31.768918 2.843343 ... 3.296696 0.442899 0.068985 77.241884 1.936384 6.645905 2.120117 3.890331e+04 0.222799 2.342916
std 48.777665 1.791369 1.860483 0.645570 11.639425 11.484236 1.378265 32.689262 29.645534 9.418801 ... 0.834300 0.382783 0.084470 84.615905 4.081128 12.138143 3.682237 8.113996e+04 0.566343 3.877087
min 1.417582 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.007560 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000e+00 0.000000 0.000000
25% 50.571429 4.857143 4.835165 0.000000 2.236813 1.991319 0.000000 15.662088 14.129533 0.000000 ... 2.827610 0.238460 0.020690 26.000000 0.000000 0.000000 0.000000 0.000000e+00 0.000000 0.000000
50% 75.884615 5.362637 5.351648 0.000000 7.230769 6.958132 0.000000 26.626484 24.664286 0.000000 ... 3.162200 0.359110 0.049820 50.444667 0.000000 2.000000 1.000000 8.190000e+03 0.000000 1.000000
75% 104.241758 5.697802 5.626374 0.000000 14.246621 13.890110 0.000000 43.646071 40.243379 1.747253 ... 3.617820 0.530890 0.088150 96.611000 2.000000 7.750000 2.666667 3.626376e+04 0.000000 3.000000
max 732.263736 57.880659 57.880659 11.440440 167.252418 167.252418 60.815934 795.917802 768.738901 336.783626 ... 10.437050 8.623290 2.285620 1274.723333 91.000000 238.333333 106.333333 1.235806e+06 5.500000 106.333333

8 rows × 46 columns

In [47]:
df_comp.isna().sum().sort_values(ascending=False).head(25)
Out[47]:
Registered Nurse hours per resident per day on the weekend               283
Reported Physical Therapist Staffing Hours per Resident Per Day          283
Reported Nurse Aide Staffing Hours per Resident per Day                  283
Reported LPN Staffing Hours per Resident per Day                         283
Reported RN Staffing Hours per Resident per Day                          283
Reported Licensed Staffing Hours per Resident per Day                    283
Total number of nurse staff hours per resident per day on the weekend    283
Reported Total Nurse Staffing Hours per Resident per Day                 283
QM Rating                                                                206
Staffing Rating                                                          171
Overall Rating                                                           143
Health Inspection Rating                                                 143
Total Weighted Health Survey Score                                        51
Average Number of Residents per Day                                       41
Hrs_RNadmin_ctr                                                            0
Hrs_RN                                                                     0
Provider Name                                                              0
CITY                                                                       0
STATE                                                                      0
CY_Qtr                                                                     0
MDScensus                                                                  0
Hrs_RNDON                                                                  0
Hrs_RNDON_emp                                                              0
Hrs_RNDON_ctr                                                              0
Hrs_RNadmin                                                                0
dtype: int64
In [48]:
df_comp_clean = df_comp.dropna(subset=["Registered Nurse hours per resident per day on the weekend", 'QM Rating', 'Overall Rating'])
In [49]:
df_comp_clean.isna().sum().sort_values(ascending=False).head(25)
Out[49]:
Provider Name       0
CITY                0
STATE               0
CY_Qtr              0
MDScensus           0
Hrs_RNDON           0
Hrs_RNDON_emp       0
Hrs_RNDON_ctr       0
Hrs_RNadmin         0
Hrs_RNadmin_emp     0
Hrs_RNadmin_ctr     0
Hrs_RN              0
Hrs_RN_emp          0
Hrs_RN_ctr          0
Hrs_LPNadmin        0
Hrs_LPNadmin_emp    0
Hrs_LPNadmin_ctr    0
Hrs_LPN             0
Hrs_LPN_emp         0
Hrs_LPN_ctr         0
Hrs_CNA             0
Hrs_CNA_emp         0
Hrs_CNA_ctr         0
Hrs_NAtrn           0
Hrs_NAtrn_emp       0
dtype: int64
In [50]:
df_comp_clean.shape
Out[50]:
(15839, 58)
Quick sanity check to make sure no nursing homes have only contract workers and no full time workers.¶
In [51]:
print(df_comp_clean[(df_comp_clean['Hrs_RNDON_ctr'] > 0) & (df_comp_clean['Hrs_RNDON'] == 0)])
print(df_comp_clean[(df_comp_clean['Hrs_RNadmin_ctr'] > 0) & (df_comp_clean['Hrs_RNadmin'] == 0)])
print(df_comp_clean[(df_comp_clean['Hrs_RN_ctr'] > 0) & (df_comp_clean['Hrs_RN'] == 0)])
print(df_comp_clean[(df_comp_clean['Hrs_LPNadmin_ctr'] > 0) & (df_comp_clean['Hrs_LPNadmin'] == 0)])
print(df_comp_clean[(df_comp_clean['Hrs_LPN_ctr'] > 0) & (df_comp_clean['Hrs_LPN'] == 0)])
print(df_comp_clean[(df_comp_clean['Hrs_CNA_ctr'] > 0) & (df_comp_clean['Hrs_CNA'] == 0)])
print(df_comp_clean[(df_comp_clean['Hrs_NAtrn_ctr'] > 0) & (df_comp_clean['Hrs_NAtrn'] == 0)])
print(df_comp_clean[(df_comp_clean['Hrs_MedAide_ctr'] > 0) & (df_comp_clean['Hrs_MedAide'] == 0)])
Empty DataFrame
Columns: [Provider Name, CITY, STATE, CY_Qtr, MDScensus, Hrs_RNDON, Hrs_RNDON_emp, Hrs_RNDON_ctr, Hrs_RNadmin, Hrs_RNadmin_emp, Hrs_RNadmin_ctr, Hrs_RN, Hrs_RN_emp, Hrs_RN_ctr, Hrs_LPNadmin, Hrs_LPNadmin_emp, Hrs_LPNadmin_ctr, Hrs_LPN, Hrs_LPN_emp, Hrs_LPN_ctr, Hrs_CNA, Hrs_CNA_emp, Hrs_CNA_ctr, Hrs_NAtrn, Hrs_NAtrn_emp, Hrs_NAtrn_ctr, Hrs_MedAide, Hrs_MedAide_emp, Hrs_MedAide_ctr, Provider Address, City/Town, State, County/Parish, Ownership Type, Provider Type, Legal Business Name, Abuse Icon, Number of Certified Beds, Average Number of Residents per Day, Overall Rating, Health Inspection Rating, QM Rating, Staffing Rating, Reported Nurse Aide Staffing Hours per Resident per Day, Reported LPN Staffing Hours per Resident per Day, Reported RN Staffing Hours per Resident per Day, Reported Licensed Staffing Hours per Resident per Day, Reported Total Nurse Staffing Hours per Resident per Day, Total number of nurse staff hours per resident per day on the weekend, Registered Nurse hours per resident per day on the weekend, Reported Physical Therapist Staffing Hours per Resident Per Day, Total Weighted Health Survey Score, Number of Facility Reported Incidents, Number of Substantiated Complaints, Number of Fines, Total Amount of Fines in Dollars, Number of Payment Denials, Total Number of Penalties]
Index: []

[0 rows x 58 columns]
Empty DataFrame
Columns: [Provider Name, CITY, STATE, CY_Qtr, MDScensus, Hrs_RNDON, Hrs_RNDON_emp, Hrs_RNDON_ctr, Hrs_RNadmin, Hrs_RNadmin_emp, Hrs_RNadmin_ctr, Hrs_RN, Hrs_RN_emp, Hrs_RN_ctr, Hrs_LPNadmin, Hrs_LPNadmin_emp, Hrs_LPNadmin_ctr, Hrs_LPN, Hrs_LPN_emp, Hrs_LPN_ctr, Hrs_CNA, Hrs_CNA_emp, Hrs_CNA_ctr, Hrs_NAtrn, Hrs_NAtrn_emp, Hrs_NAtrn_ctr, Hrs_MedAide, Hrs_MedAide_emp, Hrs_MedAide_ctr, Provider Address, City/Town, State, County/Parish, Ownership Type, Provider Type, Legal Business Name, Abuse Icon, Number of Certified Beds, Average Number of Residents per Day, Overall Rating, Health Inspection Rating, QM Rating, Staffing Rating, Reported Nurse Aide Staffing Hours per Resident per Day, Reported LPN Staffing Hours per Resident per Day, Reported RN Staffing Hours per Resident per Day, Reported Licensed Staffing Hours per Resident per Day, Reported Total Nurse Staffing Hours per Resident per Day, Total number of nurse staff hours per resident per day on the weekend, Registered Nurse hours per resident per day on the weekend, Reported Physical Therapist Staffing Hours per Resident Per Day, Total Weighted Health Survey Score, Number of Facility Reported Incidents, Number of Substantiated Complaints, Number of Fines, Total Amount of Fines in Dollars, Number of Payment Denials, Total Number of Penalties]
Index: []

[0 rows x 58 columns]
Empty DataFrame
Columns: [Provider Name, CITY, STATE, CY_Qtr, MDScensus, Hrs_RNDON, Hrs_RNDON_emp, Hrs_RNDON_ctr, Hrs_RNadmin, Hrs_RNadmin_emp, Hrs_RNadmin_ctr, Hrs_RN, Hrs_RN_emp, Hrs_RN_ctr, Hrs_LPNadmin, Hrs_LPNadmin_emp, Hrs_LPNadmin_ctr, Hrs_LPN, Hrs_LPN_emp, Hrs_LPN_ctr, Hrs_CNA, Hrs_CNA_emp, Hrs_CNA_ctr, Hrs_NAtrn, Hrs_NAtrn_emp, Hrs_NAtrn_ctr, Hrs_MedAide, Hrs_MedAide_emp, Hrs_MedAide_ctr, Provider Address, City/Town, State, County/Parish, Ownership Type, Provider Type, Legal Business Name, Abuse Icon, Number of Certified Beds, Average Number of Residents per Day, Overall Rating, Health Inspection Rating, QM Rating, Staffing Rating, Reported Nurse Aide Staffing Hours per Resident per Day, Reported LPN Staffing Hours per Resident per Day, Reported RN Staffing Hours per Resident per Day, Reported Licensed Staffing Hours per Resident per Day, Reported Total Nurse Staffing Hours per Resident per Day, Total number of nurse staff hours per resident per day on the weekend, Registered Nurse hours per resident per day on the weekend, Reported Physical Therapist Staffing Hours per Resident Per Day, Total Weighted Health Survey Score, Number of Facility Reported Incidents, Number of Substantiated Complaints, Number of Fines, Total Amount of Fines in Dollars, Number of Payment Denials, Total Number of Penalties]
Index: []

[0 rows x 58 columns]
Empty DataFrame
Columns: [Provider Name, CITY, STATE, CY_Qtr, MDScensus, Hrs_RNDON, Hrs_RNDON_emp, Hrs_RNDON_ctr, Hrs_RNadmin, Hrs_RNadmin_emp, Hrs_RNadmin_ctr, Hrs_RN, Hrs_RN_emp, Hrs_RN_ctr, Hrs_LPNadmin, Hrs_LPNadmin_emp, Hrs_LPNadmin_ctr, Hrs_LPN, Hrs_LPN_emp, Hrs_LPN_ctr, Hrs_CNA, Hrs_CNA_emp, Hrs_CNA_ctr, Hrs_NAtrn, Hrs_NAtrn_emp, Hrs_NAtrn_ctr, Hrs_MedAide, Hrs_MedAide_emp, Hrs_MedAide_ctr, Provider Address, City/Town, State, County/Parish, Ownership Type, Provider Type, Legal Business Name, Abuse Icon, Number of Certified Beds, Average Number of Residents per Day, Overall Rating, Health Inspection Rating, QM Rating, Staffing Rating, Reported Nurse Aide Staffing Hours per Resident per Day, Reported LPN Staffing Hours per Resident per Day, Reported RN Staffing Hours per Resident per Day, Reported Licensed Staffing Hours per Resident per Day, Reported Total Nurse Staffing Hours per Resident per Day, Total number of nurse staff hours per resident per day on the weekend, Registered Nurse hours per resident per day on the weekend, Reported Physical Therapist Staffing Hours per Resident Per Day, Total Weighted Health Survey Score, Number of Facility Reported Incidents, Number of Substantiated Complaints, Number of Fines, Total Amount of Fines in Dollars, Number of Payment Denials, Total Number of Penalties]
Index: []

[0 rows x 58 columns]
Empty DataFrame
Columns: [Provider Name, CITY, STATE, CY_Qtr, MDScensus, Hrs_RNDON, Hrs_RNDON_emp, Hrs_RNDON_ctr, Hrs_RNadmin, Hrs_RNadmin_emp, Hrs_RNadmin_ctr, Hrs_RN, Hrs_RN_emp, Hrs_RN_ctr, Hrs_LPNadmin, Hrs_LPNadmin_emp, Hrs_LPNadmin_ctr, Hrs_LPN, Hrs_LPN_emp, Hrs_LPN_ctr, Hrs_CNA, Hrs_CNA_emp, Hrs_CNA_ctr, Hrs_NAtrn, Hrs_NAtrn_emp, Hrs_NAtrn_ctr, Hrs_MedAide, Hrs_MedAide_emp, Hrs_MedAide_ctr, Provider Address, City/Town, State, County/Parish, Ownership Type, Provider Type, Legal Business Name, Abuse Icon, Number of Certified Beds, Average Number of Residents per Day, Overall Rating, Health Inspection Rating, QM Rating, Staffing Rating, Reported Nurse Aide Staffing Hours per Resident per Day, Reported LPN Staffing Hours per Resident per Day, Reported RN Staffing Hours per Resident per Day, Reported Licensed Staffing Hours per Resident per Day, Reported Total Nurse Staffing Hours per Resident per Day, Total number of nurse staff hours per resident per day on the weekend, Registered Nurse hours per resident per day on the weekend, Reported Physical Therapist Staffing Hours per Resident Per Day, Total Weighted Health Survey Score, Number of Facility Reported Incidents, Number of Substantiated Complaints, Number of Fines, Total Amount of Fines in Dollars, Number of Payment Denials, Total Number of Penalties]
Index: []

[0 rows x 58 columns]
Empty DataFrame
Columns: [Provider Name, CITY, STATE, CY_Qtr, MDScensus, Hrs_RNDON, Hrs_RNDON_emp, Hrs_RNDON_ctr, Hrs_RNadmin, Hrs_RNadmin_emp, Hrs_RNadmin_ctr, Hrs_RN, Hrs_RN_emp, Hrs_RN_ctr, Hrs_LPNadmin, Hrs_LPNadmin_emp, Hrs_LPNadmin_ctr, Hrs_LPN, Hrs_LPN_emp, Hrs_LPN_ctr, Hrs_CNA, Hrs_CNA_emp, Hrs_CNA_ctr, Hrs_NAtrn, Hrs_NAtrn_emp, Hrs_NAtrn_ctr, Hrs_MedAide, Hrs_MedAide_emp, Hrs_MedAide_ctr, Provider Address, City/Town, State, County/Parish, Ownership Type, Provider Type, Legal Business Name, Abuse Icon, Number of Certified Beds, Average Number of Residents per Day, Overall Rating, Health Inspection Rating, QM Rating, Staffing Rating, Reported Nurse Aide Staffing Hours per Resident per Day, Reported LPN Staffing Hours per Resident per Day, Reported RN Staffing Hours per Resident per Day, Reported Licensed Staffing Hours per Resident per Day, Reported Total Nurse Staffing Hours per Resident per Day, Total number of nurse staff hours per resident per day on the weekend, Registered Nurse hours per resident per day on the weekend, Reported Physical Therapist Staffing Hours per Resident Per Day, Total Weighted Health Survey Score, Number of Facility Reported Incidents, Number of Substantiated Complaints, Number of Fines, Total Amount of Fines in Dollars, Number of Payment Denials, Total Number of Penalties]
Index: []

[0 rows x 58 columns]
Empty DataFrame
Columns: [Provider Name, CITY, STATE, CY_Qtr, MDScensus, Hrs_RNDON, Hrs_RNDON_emp, Hrs_RNDON_ctr, Hrs_RNadmin, Hrs_RNadmin_emp, Hrs_RNadmin_ctr, Hrs_RN, Hrs_RN_emp, Hrs_RN_ctr, Hrs_LPNadmin, Hrs_LPNadmin_emp, Hrs_LPNadmin_ctr, Hrs_LPN, Hrs_LPN_emp, Hrs_LPN_ctr, Hrs_CNA, Hrs_CNA_emp, Hrs_CNA_ctr, Hrs_NAtrn, Hrs_NAtrn_emp, Hrs_NAtrn_ctr, Hrs_MedAide, Hrs_MedAide_emp, Hrs_MedAide_ctr, Provider Address, City/Town, State, County/Parish, Ownership Type, Provider Type, Legal Business Name, Abuse Icon, Number of Certified Beds, Average Number of Residents per Day, Overall Rating, Health Inspection Rating, QM Rating, Staffing Rating, Reported Nurse Aide Staffing Hours per Resident per Day, Reported LPN Staffing Hours per Resident per Day, Reported RN Staffing Hours per Resident per Day, Reported Licensed Staffing Hours per Resident per Day, Reported Total Nurse Staffing Hours per Resident per Day, Total number of nurse staff hours per resident per day on the weekend, Registered Nurse hours per resident per day on the weekend, Reported Physical Therapist Staffing Hours per Resident Per Day, Total Weighted Health Survey Score, Number of Facility Reported Incidents, Number of Substantiated Complaints, Number of Fines, Total Amount of Fines in Dollars, Number of Payment Denials, Total Number of Penalties]
Index: []

[0 rows x 58 columns]
Empty DataFrame
Columns: [Provider Name, CITY, STATE, CY_Qtr, MDScensus, Hrs_RNDON, Hrs_RNDON_emp, Hrs_RNDON_ctr, Hrs_RNadmin, Hrs_RNadmin_emp, Hrs_RNadmin_ctr, Hrs_RN, Hrs_RN_emp, Hrs_RN_ctr, Hrs_LPNadmin, Hrs_LPNadmin_emp, Hrs_LPNadmin_ctr, Hrs_LPN, Hrs_LPN_emp, Hrs_LPN_ctr, Hrs_CNA, Hrs_CNA_emp, Hrs_CNA_ctr, Hrs_NAtrn, Hrs_NAtrn_emp, Hrs_NAtrn_ctr, Hrs_MedAide, Hrs_MedAide_emp, Hrs_MedAide_ctr, Provider Address, City/Town, State, County/Parish, Ownership Type, Provider Type, Legal Business Name, Abuse Icon, Number of Certified Beds, Average Number of Residents per Day, Overall Rating, Health Inspection Rating, QM Rating, Staffing Rating, Reported Nurse Aide Staffing Hours per Resident per Day, Reported LPN Staffing Hours per Resident per Day, Reported RN Staffing Hours per Resident per Day, Reported Licensed Staffing Hours per Resident per Day, Reported Total Nurse Staffing Hours per Resident per Day, Total number of nurse staff hours per resident per day on the weekend, Registered Nurse hours per resident per day on the weekend, Reported Physical Therapist Staffing Hours per Resident Per Day, Total Weighted Health Survey Score, Number of Facility Reported Incidents, Number of Substantiated Complaints, Number of Fines, Total Amount of Fines in Dollars, Number of Payment Denials, Total Number of Penalties]
Index: []

[0 rows x 58 columns]
In [ ]:
df_comp_clean['ratio_RNDON'] = np.where(df_comp_clean['Hrs_RNDON'] != 0, df_comp_clean['Hrs_RNDON_ctr'] / df_comp_clean['Hrs_RNDON'], 0)
df_comp_clean['ratio_RNadmin'] = np.where(df_comp_clean['Hrs_RNadmin'] != 0, df_comp_clean['Hrs_RNadmin_ctr'] / df_comp_clean['Hrs_RNadmin'], 0)   
df_comp_clean['ratio_RN'] = np.where(df_comp_clean['Hrs_RN'] != 0, df_comp_clean['Hrs_RN_ctr'] / df_comp_clean['Hrs_RN'], 0)   
df_comp_clean['ratio_LPNadmin'] = np.where(df_comp_clean['Hrs_LPNadmin'] != 0, df_comp_clean['Hrs_LPNadmin_ctr'] / df_comp_clean['Hrs_LPNadmin'], 0)   
df_comp_clean['ratio_LPN'] = np.where(df_comp_clean['Hrs_LPN'] != 0, df_comp_clean['Hrs_LPN_ctr'] / df_comp_clean['Hrs_LPN'], 0)   
df_comp_clean['ratio_CNA'] = np.where(df_comp_clean['Hrs_CNA'] != 0, df_comp_clean['Hrs_CNA_ctr'] / df_comp_clean['Hrs_CNA'], 0)   
df_comp_clean['ratio_NAtrn'] = np.where(df_comp_clean['Hrs_NAtrn'] != 0, df_comp_clean['Hrs_NAtrn_ctr'] / df_comp_clean['Hrs_NAtrn'], 0)   
df_comp_clean['ratio_MedAide'] = np.where(df_comp_clean['Hrs_MedAide'] != 0, df_comp_clean['Hrs_MedAide_ctr'] / df_comp_clean['Hrs_MedAide'], 0)   
In [53]:
df_comp_clean.isna().sum().sort_values(ascending=False).head(25)
Out[53]:
Provider Name       0
CITY                0
STATE               0
CY_Qtr              0
MDScensus           0
Hrs_RNDON           0
Hrs_RNDON_emp       0
Hrs_RNDON_ctr       0
Hrs_RNadmin         0
Hrs_RNadmin_emp     0
Hrs_RNadmin_ctr     0
Hrs_RN              0
Hrs_RN_emp          0
Hrs_RN_ctr          0
Hrs_LPNadmin        0
Hrs_LPNadmin_emp    0
Hrs_LPNadmin_ctr    0
Hrs_LPN             0
Hrs_LPN_emp         0
Hrs_LPN_ctr         0
Hrs_CNA             0
Hrs_CNA_emp         0
Hrs_CNA_ctr         0
Hrs_NAtrn           0
Hrs_NAtrn_emp       0
dtype: int64
In [54]:
df_comp_clean.info()
<class 'pandas.core.frame.DataFrame'>
Index: 15839 entries, 0 to 16279
Data columns (total 66 columns):
 #   Column                                                                 Non-Null Count  Dtype  
---  ------                                                                 --------------  -----  
 0   Provider Name                                                          15839 non-null  object 
 1   CITY                                                                   15839 non-null  object 
 2   STATE                                                                  15839 non-null  object 
 3   CY_Qtr                                                                 15839 non-null  object 
 4   MDScensus                                                              15839 non-null  float64
 5   Hrs_RNDON                                                              15839 non-null  float64
 6   Hrs_RNDON_emp                                                          15839 non-null  float64
 7   Hrs_RNDON_ctr                                                          15839 non-null  float64
 8   Hrs_RNadmin                                                            15839 non-null  float64
 9   Hrs_RNadmin_emp                                                        15839 non-null  float64
 10  Hrs_RNadmin_ctr                                                        15839 non-null  float64
 11  Hrs_RN                                                                 15839 non-null  float64
 12  Hrs_RN_emp                                                             15839 non-null  float64
 13  Hrs_RN_ctr                                                             15839 non-null  float64
 14  Hrs_LPNadmin                                                           15839 non-null  float64
 15  Hrs_LPNadmin_emp                                                       15839 non-null  float64
 16  Hrs_LPNadmin_ctr                                                       15839 non-null  float64
 17  Hrs_LPN                                                                15839 non-null  float64
 18  Hrs_LPN_emp                                                            15839 non-null  float64
 19  Hrs_LPN_ctr                                                            15839 non-null  float64
 20  Hrs_CNA                                                                15839 non-null  float64
 21  Hrs_CNA_emp                                                            15839 non-null  float64
 22  Hrs_CNA_ctr                                                            15839 non-null  float64
 23  Hrs_NAtrn                                                              15839 non-null  float64
 24  Hrs_NAtrn_emp                                                          15839 non-null  float64
 25  Hrs_NAtrn_ctr                                                          15839 non-null  float64
 26  Hrs_MedAide                                                            15839 non-null  float64
 27  Hrs_MedAide_emp                                                        15839 non-null  float64
 28  Hrs_MedAide_ctr                                                        15839 non-null  float64
 29  Provider Address                                                       15839 non-null  object 
 30  City/Town                                                              15839 non-null  object 
 31  State                                                                  15839 non-null  object 
 32  County/Parish                                                          15839 non-null  object 
 33  Ownership Type                                                         15839 non-null  object 
 34  Provider Type                                                          15839 non-null  object 
 35  Legal Business Name                                                    15839 non-null  object 
 36  Abuse Icon                                                             15839 non-null  object 
 37  Number of Certified Beds                                               15839 non-null  float64
 38  Average Number of Residents per Day                                    15839 non-null  float64
 39  Overall Rating                                                         15839 non-null  float64
 40  Health Inspection Rating                                               15839 non-null  float64
 41  QM Rating                                                              15839 non-null  float64
 42  Staffing Rating                                                        15839 non-null  float64
 43  Reported Nurse Aide Staffing Hours per Resident per Day                15839 non-null  float64
 44  Reported LPN Staffing Hours per Resident per Day                       15839 non-null  float64
 45  Reported RN Staffing Hours per Resident per Day                        15839 non-null  float64
 46  Reported Licensed Staffing Hours per Resident per Day                  15839 non-null  float64
 47  Reported Total Nurse Staffing Hours per Resident per Day               15839 non-null  float64
 48  Total number of nurse staff hours per resident per day on the weekend  15839 non-null  float64
 49  Registered Nurse hours per resident per day on the weekend             15839 non-null  float64
 50  Reported Physical Therapist Staffing Hours per Resident Per Day        15839 non-null  float64
 51  Total Weighted Health Survey Score                                     15839 non-null  float64
 52  Number of Facility Reported Incidents                                  15839 non-null  float64
 53  Number of Substantiated Complaints                                     15839 non-null  float64
 54  Number of Fines                                                        15839 non-null  float64
 55  Total Amount of Fines in Dollars                                       15839 non-null  float64
 56  Number of Payment Denials                                              15839 non-null  float64
 57  Total Number of Penalties                                              15839 non-null  float64
 58  ratio_RNDON                                                            15839 non-null  float64
 59  ratio_RNadmin                                                          15839 non-null  float64
 60  ratio_RN                                                               15839 non-null  float64
 61  ratio_LPNadmin                                                         15839 non-null  float64
 62  ratio_LPN                                                              15839 non-null  float64
 63  ratio_CNA                                                              15839 non-null  float64
 64  ratio_NAtrn                                                            15839 non-null  float64
 65  ratio_MedAide                                                          15839 non-null  float64
dtypes: float64(54), object(12)
memory usage: 8.1+ MB
Our ratio columns have been successfully created, so now lets dive into the dataset¶
In [55]:
df_comp_clean.columns
Out[55]:
Index(['Provider Name', 'CITY', 'STATE', 'CY_Qtr', 'MDScensus', 'Hrs_RNDON',
       'Hrs_RNDON_emp', 'Hrs_RNDON_ctr', 'Hrs_RNadmin', 'Hrs_RNadmin_emp',
       'Hrs_RNadmin_ctr', 'Hrs_RN', 'Hrs_RN_emp', 'Hrs_RN_ctr', 'Hrs_LPNadmin',
       'Hrs_LPNadmin_emp', 'Hrs_LPNadmin_ctr', 'Hrs_LPN', 'Hrs_LPN_emp',
       'Hrs_LPN_ctr', 'Hrs_CNA', 'Hrs_CNA_emp', 'Hrs_CNA_ctr', 'Hrs_NAtrn',
       'Hrs_NAtrn_emp', 'Hrs_NAtrn_ctr', 'Hrs_MedAide', 'Hrs_MedAide_emp',
       'Hrs_MedAide_ctr', 'Provider Address', 'City/Town', 'State',
       'County/Parish', 'Ownership Type', 'Provider Type',
       'Legal Business Name', 'Abuse Icon', 'Number of Certified Beds',
       'Average Number of Residents per Day', 'Overall Rating',
       'Health Inspection Rating', 'QM Rating', 'Staffing Rating',
       'Reported Nurse Aide Staffing Hours per Resident per Day',
       'Reported LPN Staffing Hours per Resident per Day',
       'Reported RN Staffing Hours per Resident per Day',
       'Reported Licensed Staffing Hours per Resident per Day',
       'Reported Total Nurse Staffing Hours per Resident per Day',
       'Total number of nurse staff hours per resident per day on the weekend',
       'Registered Nurse hours per resident per day on the weekend',
       'Reported Physical Therapist Staffing Hours per Resident Per Day',
       'Total Weighted Health Survey Score',
       'Number of Facility Reported Incidents',
       'Number of Substantiated Complaints', 'Number of Fines',
       'Total Amount of Fines in Dollars', 'Number of Payment Denials',
       'Total Number of Penalties', 'ratio_RNDON', 'ratio_RNadmin', 'ratio_RN',
       'ratio_LPNadmin', 'ratio_LPN', 'ratio_CNA', 'ratio_NAtrn',
       'ratio_MedAide'],
      dtype='object')
In [56]:
df_comp_clean1 = df_comp_clean[['Provider Name', 'CITY', 'STATE', 'CY_Qtr', 'County/Parish', 'Ownership Type', 'Provider Type', 'Abuse Icon', 'MDScensus', 'Hrs_RNDON',
       'Hrs_RNDON_emp', 'Hrs_RNDON_ctr', 'Hrs_RNadmin', 'Hrs_RNadmin_emp',
       'Hrs_RNadmin_ctr', 'Hrs_RN', 'Hrs_RN_emp', 'Hrs_RN_ctr', 'Hrs_LPNadmin',
       'Hrs_LPNadmin_emp', 'Hrs_LPNadmin_ctr', 'Hrs_LPN', 'Hrs_LPN_emp',
       'Hrs_LPN_ctr', 'Hrs_CNA', 'Hrs_CNA_emp', 'Hrs_CNA_ctr', 'Hrs_NAtrn',
       'Hrs_NAtrn_emp', 'Hrs_NAtrn_ctr', 'Hrs_MedAide', 'Hrs_MedAide_emp',
       'Hrs_MedAide_ctr','Number of Certified Beds',
       'Average Number of Residents per Day', 'Overall Rating',
       'Health Inspection Rating', 'QM Rating', 'Staffing Rating',
       'Reported Nurse Aide Staffing Hours per Resident per Day',
       'Reported LPN Staffing Hours per Resident per Day',
       'Reported RN Staffing Hours per Resident per Day',
       'Reported Licensed Staffing Hours per Resident per Day',
       'Reported Total Nurse Staffing Hours per Resident per Day',
       'Total number of nurse staff hours per resident per day on the weekend',
       'Registered Nurse hours per resident per day on the weekend',
       'Reported Physical Therapist Staffing Hours per Resident Per Day',
       'Total Weighted Health Survey Score',
       'Number of Facility Reported Incidents',
       'Number of Substantiated Complaints', 'Number of Fines',
       'Total Amount of Fines in Dollars', 'Number of Payment Denials',
       'Total Number of Penalties', 'ratio_RNDON', 'ratio_RNadmin', 'ratio_RN',
       'ratio_LPNadmin', 'ratio_LPN', 'ratio_CNA', 'ratio_NAtrn',
       'ratio_MedAide']]
In [57]:
df_comp_clean1['STATE'].value_counts().sort_values(ascending=False)
Out[57]:
STATE
TX    1313
CA    1263
OH     970
FL     792
IL     786
IN     706
PA     692
NY     619
MO     512
MI     451
NC     434
IA     425
GA     378
MN     371
MA     363
NJ     360
WI     335
KS     317
TN     313
VA     292
KY     290
OK     285
LA     272
CO     236
MD     235
AR     220
AL     219
MS     211
WA     195
CT     195
SC     190
NE     189
AZ     145
WV     137
OR     130
UT     103
SD      99
ME      83
RI      81
ID      78
NH      78
NM      77
NV      70
ND      67
MT      63
DE      54
HI      40
WY      38
VT      33
DC      19
AK      15
Name: count, dtype: int64
In [58]:
df_comp_clean_grp = df_comp_clean1.groupby(['Provider Name', 'CITY', 'STATE', 'CY_Qtr', 'County/Parish', 'Ownership Type', 'Provider Type', 'Abuse Icon']).mean()
df_comp_clean_grp.reset_index(inplace=True)
In [121]:
# Creating a uniform color palette for the plots 
palette = sns.color_palette("Set2", n_colors=16)
sns.set_palette(palette)
sns.set_style("darkgrid")
In [122]:
sns.violinplot(x=df_comp_clean_grp["Overall Rating"])
plt.title("Distribution of Overall Ratings")
plt.xlabel("Overall Rating")
Out[122]:
Text(0.5, 0, 'Overall Rating')
No description has been provided for this image
We see a slight higher bump of 1 rating facility. Lets try to understand what is driving down the rating¶
In [123]:
sns.violinplot(x=df_comp_clean_grp["Staffing Rating"])
plt.title("Distribution of Staffing Ratings by Provider")
Out[123]:
Text(0.5, 1.0, 'Distribution of Staffing Ratings by Provider')
No description has been provided for this image
For our staffing rating we see a small number of providers that have a 5 rating but an average around 3 which is good to see.¶
In [98]:
df_comp_clean_num = df_comp_clean[['MDScensus', 'Hrs_RNDON',
       'Hrs_RNDON_emp', 'Hrs_RNDON_ctr', 'Hrs_RNadmin', 'Hrs_RNadmin_emp',
       'Hrs_RNadmin_ctr', 'Hrs_RN', 'Hrs_RN_emp', 'Hrs_RN_ctr', 'Hrs_LPNadmin',
       'Hrs_LPNadmin_emp', 'Hrs_LPNadmin_ctr', 'Hrs_LPN', 'Hrs_LPN_emp',
       'Hrs_LPN_ctr', 'Hrs_CNA', 'Hrs_CNA_emp', 'Hrs_CNA_ctr', 'Hrs_NAtrn',
       'Hrs_NAtrn_emp', 'Hrs_NAtrn_ctr', 'Hrs_MedAide', 'Hrs_MedAide_emp',
       'Hrs_MedAide_ctr','Number of Certified Beds',
       'Average Number of Residents per Day', 'Overall Rating',
       'Health Inspection Rating', 'QM Rating', 'Staffing Rating',
       'Reported Nurse Aide Staffing Hours per Resident per Day',
       'Reported LPN Staffing Hours per Resident per Day',
       'Reported RN Staffing Hours per Resident per Day',
       'Reported Licensed Staffing Hours per Resident per Day',
       'Reported Total Nurse Staffing Hours per Resident per Day',
       'Total number of nurse staff hours per resident per day on the weekend',
       'Registered Nurse hours per resident per day on the weekend',
       'Reported Physical Therapist Staffing Hours per Resident Per Day',
       'Total Weighted Health Survey Score',
       'Number of Facility Reported Incidents',
       'Number of Substantiated Complaints', 'Number of Fines',
       'Total Amount of Fines in Dollars', 'Number of Payment Denials',
       'Total Number of Penalties', 'ratio_RNDON', 'ratio_RNadmin', 'ratio_RN',
       'ratio_LPNadmin', 'ratio_LPN', 'ratio_CNA', 'ratio_NAtrn',
       'ratio_MedAide']]
In [101]:
correlation_matrix = df_comp_clean_num.corr()
plt.figure(figsize=(30, 15))
sns.heatmap(correlation_matrix, annot=True, cmap = "YlGnBu")
plt.title('Correlation Heatmap')
plt.show()
No description has been provided for this image
Good start but a little crowded. Lets see if we can simplify and look at the features that are impacting staffing rating in nursing facilities accross the US.¶
Lets see if we can limit to staffing rating¶
In [65]:
df_comp_clean_staffing_hrs2 = df_comp_clean[['Overall Rating', 'Staffing Rating',
       'Reported Nurse Aide Staffing Hours per Resident per Day',
       'Reported LPN Staffing Hours per Resident per Day',
       'Reported RN Staffing Hours per Resident per Day',
       'Reported Licensed Staffing Hours per Resident per Day',
       'Reported Total Nurse Staffing Hours per Resident per Day',
       'Total number of nurse staff hours per resident per day on the weekend',
       'Registered Nurse hours per resident per day on the weekend',
       'Total Number of Penalties', 'ratio_RNDON', 'ratio_RNadmin', 'ratio_RN',
       'ratio_LPNadmin', 'ratio_LPN', 'ratio_CNA', 'ratio_NAtrn',
       'ratio_MedAide','Hrs_RNDON',
       'Hrs_RNDON_emp', 'Hrs_RNDON_ctr', 'Hrs_RNadmin', 'Hrs_RNadmin_emp',
       'Hrs_RNadmin_ctr', 'Hrs_RN', 'Hrs_RN_emp', 'Hrs_RN_ctr', 'Hrs_LPNadmin',
       'Hrs_LPNadmin_emp', 'Hrs_LPNadmin_ctr', 'Hrs_LPN', 'Hrs_LPN_emp',
       'Hrs_LPN_ctr', 'Hrs_CNA', 'Hrs_CNA_emp', 'Hrs_CNA_ctr', 'Hrs_NAtrn',
       'Hrs_NAtrn_emp', 'Hrs_NAtrn_ctr', 'Hrs_MedAide', 'Hrs_MedAide_emp',
       'Hrs_MedAide_ctr']]
In [166]:
correlation_matrix_smaller2 = df_comp_clean_staffing_hrs2.corr()
plt.figure(figsize=(30, 15))
sns.heatmap(correlation_matrix_smaller2, annot=True, cmap="YlGnBu")
plt.title('Correlation of Staffing Levels Heatmap')
plt.show()
No description has been provided for this image
This is a much easier to read heatmap. We see for staffing rating that the reported number of nursing hours and different types of staff is the largest impact for the staffing rating.¶
In [106]:
df_state_avg = df_comp_clean1.groupby("STATE", as_index=False).mean(numeric_only=True)
In [124]:
# Set style - aligning with previous settings
sns.set_style("darkgrid")

# Metrics to analyze
metrics = ["Staffing Rating", "Average Number of Residents per Day", "Reported Total Nurse Staffing Hours per Resident per Day", "Number of Facility Reported Incidents"]

# Filter only available columns
available_metrics = [metric for metric in metrics if metric in df_state_avg.columns]

# Create subplots with correct dimensions
fig, axes = plt.subplots(len(available_metrics), 2, figsize=(14, 18))

for i, metric in enumerate(available_metrics):
    df_metric = df_state_avg[["STATE", metric]].dropna()
    df_sorted = df_metric.sort_values(by=metric, ascending=False)

    top_states = df_sorted.head(10)
    bottom_states = df_sorted.tail(10)

    sns.barplot(x=top_states[metric], y=top_states["STATE"], hue= top_states["STATE"], ax=axes[i, 0], palette= "Blues_r", legend = False)
    axes[i, 0].set_title(f"Top 10 States by {metric}")
    axes[i, 0].set_xlabel(metric)
    axes[i, 0].set_ylabel("State")

    sns.barplot(x=bottom_states[metric], y=bottom_states["STATE"], hue= bottom_states["STATE"],  ax=axes[i, 1], palette=sns.color_palette("Reds_r", n_colors=10)[::-1], legend = False)
    axes[i, 1].invert_yaxis()  
    axes[i, 1].set_title(f"Bottom 10 States by {metric}")
    axes[i, 1].set_xlabel(metric)
    axes[i, 1].set_ylabel("State")

# Adjust layout
plt.tight_layout()
plt.show()
No description has been provided for this image
From this visualization we see a breakdown by each state. It is interesting to see New York having the highest number of residents but ranked bottom 10 with staffing rating. More reason to see how to help these providers increase staffing rating for these facilities.¶
In [154]:
sns.scatterplot(x=df_comp_clean_grp['Reported Total Nurse Staffing Hours per Resident per Day'], y=df_comp_clean_grp['Staffing Rating']) 
plt.title('Total Nurse Staffing Hours vs Staffing Rating')
Out[154]:
Text(0.5, 1.0, 'Total Nurse Staffing Hours vs Staffing Rating')
No description has been provided for this image
We see a correlation with the number of nursing hours and staffing rating. So we see lower staffing ratings with a lower number of reported nurse staffing and higher rating for higher number of reported total nurse staffing hours.¶
In [155]:
sns.scatterplot(x=df_comp_clean_grp['Total Amount of Fines in Dollars'], y=df_comp_clean_grp['Staffing Rating'])
plt.title('Total Amount of Fines vs Staffing Rating')
Out[155]:
Text(0.5, 1.0, 'Total Amount of Fines vs Staffing Rating')
No description has been provided for this image
There is also a correlation with the amount of fines and staffing rating as well. Facilities with a higher staffing rating are unlikely to have fines vs. facilities with a lower staffing rating.¶
In [152]:
sns.scatterplot(
    data=df_comp_clean_grp,
    x='Number of Certified Beds',
    y='Reported Total Nurse Staffing Hours per Resident per Day',
    hue='Staffing Rating',
    palette="YlGnBu",
    alpha=0.7
)
plt.title('Total Nurse Staffing Hours vs Number of Certified Beds')
plt.xlabel('Number of Certified Beds')
plt.show()
No description has been provided for this image
This is a really good visual to see that facilities should try to stay higher than 4 nurse staffing hours per resident and less than 200 beds to maintain a high staffing level.¶
In [153]:
sns.scatterplot(x=df_comp_clean_grp['Average Number of Residents per Day'], y=df_comp_clean_grp['Reported Total Nurse Staffing Hours per Resident per Day'], hue=df_comp_clean_grp['Staffing Rating'], palette="YlGnBu")
plt.title('Total Nurse Staffing Hours vs Average Number of Residents per Day')
Out[153]:
Text(0.5, 1.0, 'Total Nurse Staffing Hours vs Average Number of Residents per Day')
No description has been provided for this image
Similar results as from our previous graph. The insights remain the same, as we would love to have facilities to have staffing at least 4 hours or more per resident and less than 200-300 residents per day to maintain a high staffing level.¶
In [156]:
sns.scatterplot(x=df_comp_clean_grp['Total Amount of Fines in Dollars'], y=df_comp_clean_grp['Reported Total Nurse Staffing Hours per Resident per Day'], hue=df_comp_clean_grp['Staffing Rating'], palette= "YlGnBu")
plt.title('Total Amount of Fines vs Total Nurse Staffing Hours') 
Out[156]:
Text(0.5, 1.0, 'Total Amount of Fines vs Total Nurse Staffing Hours')
No description has been provided for this image
In [175]:
df_comp_clean_grp[['Staffing Rating', 'Overall Rating', 'Number of Certified Beds', 'Reported Total Nurse Staffing Hours per Resident per Day','Average Number of Residents per Day', 'Total Amount of Fines in Dollars']].describe()
Out[175]:
Staffing Rating Overall Rating Number of Certified Beds Reported Total Nurse Staffing Hours per Resident per Day Average Number of Residents per Day Total Amount of Fines in Dollars
count 15516.000000 15516.000000 15516.000000 15516.000000 15516.000000 15516.000000
mean 2.673498 2.843473 107.864366 3.765816 82.295611 36946.528621
std 1.283640 1.422100 58.562363 0.916169 48.048049 75636.894681
min 1.000000 1.000000 4.000000 0.009230 2.900000 0.000000
25% 2.000000 2.000000 68.000000 3.237130 50.300000 0.000000
50% 3.000000 3.000000 100.000000 3.614915 74.800000 7900.750000
75% 4.000000 4.000000 128.000000 4.111282 102.700000 34992.750000
max 5.000000 5.000000 843.000000 11.764900 728.500000 914633.120000
Looking at the actual number the average staffing rating is 2.67 so less than 3. More important for these facilities to employ more nursing staff to provide adequate care for their residents. Total Nurse hours is less than 4 which we've seen in our visualization is less than ideal. It's nice to see the average of number of beds is ~100 which is where we want our facilities to be at.¶
In [159]:
df_state_avg.plot.bar(x='STATE', y=['Number of Certified Beds', 'Average Number of Residents per Day'],stacked = True ,title='Resident levels by State', figsize=(14, 8))
Out[159]:
<Axes: title={'center': 'Resident levels by State'}, xlabel='STATE'>
No description has been provided for this image
Good to see most facilities are filled with residents¶
In [113]:
# Stacked Bar Plot - RN staffing levels by state
df_state_avg.plot.bar(x='STATE', y=['Hrs_RN_emp', 'Hrs_RN_ctr'], stacked=True, title='RN staffing hours by state', figsize=(14, 8), color=sns.color_palette("Set2", n_colors=2))
plt.xlabel('State')
plt.ylabel('RN Staffing Hours')
plt.legend(['Employee RN Hours', 'Contract RN Hours']) 
plt.show()
No description has been provided for this image
In [114]:
 # Stacked Bar Plot - LPN staffing levels by state
df_state_avg.plot.bar(x='STATE', y=['Hrs_LPN_emp', 'Hrs_LPN_ctr'], stacked=True, title='LPN staffing hours by state', figsize=(14, 8), color = sns.color_palette("Set2", n_colors=2))
plt.xlabel('State')
plt.ylabel('LPN Staffing Hours')
plt.legend(['Employee LPN Hours', 'Contract LPN Hours']) 
plt.show()
No description has been provided for this image
In [115]:
 # Stacked Bar Plot - CNA staffing levels by state
df_state_avg.plot.bar(x='STATE', y=['Hrs_CNA_emp', 'Hrs_CNA_ctr'], stacked=True, title='LPN staffing hours by state', figsize=(14, 8), color = sns.color_palette("Set2", n_colors=2))
plt.xlabel('State')
plt.ylabel('LPN Staffing Hours')
plt.legend(['Employee LPN Hours', 'Contract LPN Hours']) 
plt.show()
No description has been provided for this image
In [151]:
 # Stacked Bar Plot - MedAide staffing levels by state
df_state_avg.plot.bar(x='STATE', y=['Hrs_MedAide_emp', 'Hrs_MedAide_ctr'], stacked=True, title='LPN staffing hours by state', figsize=(14, 8))
plt.xlabel('State')
plt.ylabel('LPN Staffing Hours')
plt.legend(['Employee LPN Hours', 'Contract LPN Hours']) 
plt.show()
No description has been provided for this image
In [117]:
 # Stacked Bar Plot - RN Admin staffing levels by state
df_state_avg.plot.bar(x='STATE', y=['Hrs_RNadmin_emp', 'Hrs_RNadmin_ctr'], stacked=True, title='RN admin staffing hours by state', figsize=(14, 8), color = sns.color_palette("Set2", n_colors=2))
plt.xlabel('State')
plt.ylabel('LPN Staffing Hours')
plt.legend(['Employee LPN Hours', 'Contract LPN Hours']) 
plt.show()
No description has been provided for this image
I was curious to see the amount of contract and full time workers used in each state. It's really hard to see any correlation as each state uses contract workers and its very similar accross the board. So far we have been exploring staffing levels and accross the states. lets start looking at facilities with high contract staffing levels and see if we find anything interesting.¶
In [143]:
df_high_cntr = df_comp_clean_grp.where((df_comp_clean_grp['ratio_RN'] > .5) | (df_comp_clean_grp['ratio_RNadmin'] > .5) | (df_comp_clean_grp['ratio_LPN'] > .5) | (df_comp_clean_grp['ratio_LPNadmin'] > .5) | (df_comp_clean_grp['ratio_CNA'] > .5) | (df_comp_clean_grp['ratio_NAtrn'] > .5) | (df_comp_clean_grp['ratio_MedAide'] > .5)).dropna()
In [165]:
ax = sns.violinplot(x=df_high_cntr["Staffing Rating"])
ax.set_title("Distribution of Staffing Ratings for Providers with High Contract Staffing Ratios")
plt.xlabel("Staffing Rating")
plt.tight_layout()
plt.show()
No description has been provided for this image
In [176]:
df_high_cntr[['Staffing Rating', 'Overall Rating', 'Number of Certified Beds', 'Reported Total Nurse Staffing Hours per Resident per Day','Average Number of Residents per Day', 'Total Amount of Fines in Dollars']].describe()
Out[176]:
Staffing Rating Overall Rating Number of Certified Beds Reported Total Nurse Staffing Hours per Resident per Day Average Number of Residents per Day Total Amount of Fines in Dollars
count 1764.000000 1764.000000 1764.000000 1764.000000 1764.000000 1764.000000
mean 2.436130 2.444492 111.822940 3.670511 82.298923 47030.859309
std 1.284896 1.356869 69.339324 0.882931 57.272033 84990.257220
min 1.000000 1.000000 16.000000 0.027910 10.800000 0.000000
25% 1.000000 1.000000 66.000000 3.192487 47.475000 216.666667
50% 2.000000 2.000000 100.000000 3.563175 70.800000 12814.585000
75% 3.000000 3.333333 134.000000 4.009790 99.650000 50095.500000
max 5.000000 5.000000 816.000000 10.090500 728.500000 740562.356667
There is a slight decrease in staffing rating if we have higher contract workers than full time staff. We still see the same reported total nursing hours per resident however we see these facilities pay a higher number of fines which is interesting.¶
In [145]:
df_high_cntr_grp = df_high_cntr.groupby("STATE", as_index=False).mean(numeric_only=True)
In [150]:
# Sort the DataFrame by 'Staffing Rating' in descending order
df_sorted = df_high_cntr_grp.sort_values(by='Staffing Rating', ascending=False)

# Plot the bar chart using the sorted DataFrame
df_sorted.plot.bar(
    x='STATE',
    y='Staffing Rating',
    title='Staffing Rating by State (High Contract Staffing)',
    figsize=(14, 8),
    legend=False 
)

plt.xlabel('State')
plt.ylabel('Staffing Rating')
plt.tight_layout()
plt.show()
No description has been provided for this image
In [149]:
sns.scatterplot(x=df_high_cntr_grp['Average Number of Residents per Day'], y=df_comp_clean_grp['Reported Total Nurse Staffing Hours per Resident per Day'], hue=df_comp_clean_grp['Staffing Rating'], palette = "YlGnBu")
Out[149]:
<Axes: xlabel='Average Number of Residents per Day', ylabel='Reported Total Nurse Staffing Hours per Resident per Day'>
No description has been provided for this image
Alaska should be studies on how its a high rated staffing level with a high number of contract workers. There is definitely more to learn out of this dataset on how Alaska is achieving those Staffing Rating levels and I would save that for future analysis. We did a very good job on cleaning each dataset and merging to understand staffing levels and improving those ratings accross states/providers. There is a clear importance to staff more nursing per resident and keep a smaller number of residents per facility. This will improve care and reduce the amount of fines per facility. There is definitely a lot more analysis that can be done for the future.¶
- Analyze how states and facilities with a high proportion of contract workers achieve high staffing ratings.
- Examine weekday vs. weekend staffing level trends to uncover potential variability in care delivery. 
- Adding additional datasets to create a timeseries analysis. We only focused on Q2 however it would be interesting to add addtional months/years to do a month over month, quarter over quarter, year over year analysis. 
- Apply machine learning techniques such as Principal Component Analysis (PCA) and K-Means clustering to identify patterns in staffing data and compare clustering results against staffing level ratings.